35
Received: 9 December 2018 Revised: 25 February 2020 Accepted: 28 February 2020 DOI: 10.1002/sim.8532 TUTORIAL IN BIOSTATISTICS STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 1—Basic theory and simple methods of adjustment Ruth H. Keogh 1 Pamela A. Shaw 2 Paul Gustafson 3 Raymond J. Carroll 4,5 Veronika Deffner 6 Kevin W. Dodd 7 Helmut Küchenhoff 8 Janet A. Tooze 9 Michael P. Wallace 10 Victor Kipnis 11 Laurence S. Freedman 12,13 1 Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK 2 Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA 3 Department of Statistics, University of British Columbia, Vancouver, British Columbia, Canada 4 Department of Statistics, Texas A&M University, College Station, Texas, USA 5 School of Mathematical and Physical Sciences, University of Technology Sydney, Broadway, New South Wales, Australia 6 Statistical Consulting Unit StaBLab, Department of Statistics, Ludwig-Maximilians-Universität, Munich, Germany 7 Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland, USA 8 Department of Statistics, Statistical Consulting Unit StaBLab, Ludwig-Maximilians-Universität, Munich, Germany 9 Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA 10 Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada Measurement error and misclassification of variables frequently occur in epi- demiology and involve variables important to public health. Their presence can impact strongly on results of statistical analyses involving such variables. How- ever, investigators commonly fail to pay attention to biases resulting from such mismeasurement. We provide, in two parts, an overview of the types of error that occur, their impacts on analytic results, and statistical methods to miti- gate the biases that they cause. In this first part, we review different types of measurement error and misclassification, emphasizing the classical, linear, and Berkson models, and on the concepts of nondifferential and differential error. We describe the impacts of these types of error in covariates and in outcome vari- ables on various analyses, including estimation and testing in regression models and estimating distributions. We outline types of ancillary studies required to provide information about such errors and discuss the implications of covari- ate measurement error for study design. Methods for ascertaining sample size requirements are outlined, both for ancillary studies designed to provide infor- mation about measurement error and for main studies where the exposure of interest is measured with error. We describe two of the simpler methods, regres- sion calibration and simulation extrapolation (SIMEX), that adjust for bias in regression coefficients caused by measurement error in continuous covariates, and illustrate their use through examples drawn from the Observing Protein and Energy (OPEN) dietary validation study. Finally, we review software available for implementing these methods. The second part of the article deals with more advanced topics. KEYWORDS Berkson error, classical error, differential error, measurement error, misclassification, nondifferential error, regression calibration, sample size, SIMEX, simulation extrapolation Published 2020. This article is a U.S. Government work and is in the public domain in the USA. Statistics in Medicine. 2020;1–35. wileyonlinelibrary.com/journal/sim 1

stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

Received: 9 December 2018 Revised: 25 February 2020 Accepted: 28 February 2020

DOI: 10.1002/sim.8532

T U T O R I A L I N B I O S T A T I S T I C S

STRATOS guidance document on measurement errorand misclassification of variables in observationalepidemiology: Part 1—Basic theory and simple methodsof adjustment

Ruth H. Keogh1 Pamela A. Shaw2 Paul Gustafson3

Raymond J. Carroll4,5 Veronika Deffner6 Kevin W. Dodd7

Helmut Küchenhoff8 Janet A. Tooze9 Michael P. Wallace10

Victor Kipnis11 Laurence S. Freedman12,13

1Department of Medical Statistics,London School of Hygiene and TropicalMedicine, London, UK2Department of Biostatistics,Epidemiology, and Informatics,University of Pennsylvania PerelmanSchool of Medicine, Philadelphia,Pennsylvania, USA3Department of Statistics, University ofBritish Columbia, Vancouver, BritishColumbia, Canada4Department of Statistics, Texas A&MUniversity, College Station, Texas, USA5School of Mathematical and PhysicalSciences, University of Technology Sydney,Broadway, New South Wales, Australia6Statistical Consulting Unit StaBLab,Department of Statistics,Ludwig-Maximilians-Universität,Munich, Germany7Biometry Research Group, Division ofCancer Prevention, National CancerInstitute, Bethesda, Maryland, USA8Department of Statistics, StatisticalConsulting Unit StaBLab,Ludwig-Maximilians-Universität,Munich, Germany9Department of Biostatistics and DataScience, Wake Forest School of Medicine,Winston-Salem, North Carolina, USA10Department of Statistics and ActuarialScience, University of Waterloo, Waterloo,Ontario, Canada

Measurement error and misclassification of variables frequently occur in epi-demiology and involve variables important to public health. Their presence canimpact strongly on results of statistical analyses involving such variables. How-ever, investigators commonly fail to pay attention to biases resulting from suchmismeasurement. We provide, in two parts, an overview of the types of errorthat occur, their impacts on analytic results, and statistical methods to miti-gate the biases that they cause. In this first part, we review different types ofmeasurement error and misclassification, emphasizing the classical, linear, andBerkson models, and on the concepts of nondifferential and differential error.We describe the impacts of these types of error in covariates and in outcome vari-ables on various analyses, including estimation and testing in regression modelsand estimating distributions. We outline types of ancillary studies required toprovide information about such errors and discuss the implications of covari-ate measurement error for study design. Methods for ascertaining sample sizerequirements are outlined, both for ancillary studies designed to provide infor-mation about measurement error and for main studies where the exposure ofinterest is measured with error. We describe two of the simpler methods, regres-sion calibration and simulation extrapolation (SIMEX), that adjust for bias inregression coefficients caused by measurement error in continuous covariates,and illustrate their use through examples drawn from the Observing Protein andEnergy (OPEN) dietary validation study. Finally, we review software availablefor implementing these methods. The second part of the article deals with moreadvanced topics.

K E Y W O R D S

Berkson error, classical error, differential error, measurement error, misclassification,nondifferential error, regression calibration, sample size, SIMEX, simulation extrapolation

Published 2020. This article is a U.S. Government work and is in the public domain in the USA.

Statistics in Medicine. 2020;1–35. wileyonlinelibrary.com/journal/sim 1

Page 2: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

2 KEOGH et al.

11Biometry Research Group, Division ofCancer Prevention, National CancerInstitute, Bethesda, Maryland, USA12Biostatistics and Biomathematics Unit,Gertner Institute for Epidemiology andHealth Policy Research, Tel Hashomer,Israel13Information Management Services Inc.,Rockville, Maryland, USA

CorrespondenceLaurence S. Freedman, Biostatistics andBiomathematics Unit, Gertner Institutefor Epidemiology and Health PolicyResearch, Sheba Medical Center, TelHashomer 52621, Israel.Email: [email protected]

Funding informationNatural Sciences and EngineeringResearch Council of Canada (NSERC),Grant/Award Number:RGPIN-2019-03957; Patient CenteredOutcomes Research Institute (PCORI),Grant/Award Number: R-1609-36207;National Institutes of Health (NIH),Grant/Award Numbers: NCIP30CA012197, U01-CA057030,R01-AI131771

1 INTRODUCTION

Measurement error and misclassification of variables are frequently encountered in epidemiology and involve vari-ables of considerable importance in public health such as smoking habits,1 dietary intakes,2 physical activity,3 and airpollution.4 Their presence can impact strongly on the results of statistical analyses that involve such variables. However,more often than not, investigators do not pay serious attention to the biases that can result from such mismeasurement.5We provide in two parts an overview of the types of error that can occur, their impacts on results from analyses, andstatistical methods to mitigate the biases that they cause. Throughout, we endeavor to retain a utilitarian approach andto relate theory to practice.

Our focus throughout is on studies which are aimed at either describing the distribution of variables in a populationor at understanding the relationships between variables, that is, on etiology. In studies in which the aim is instead toderive a prediction model, the considerations surrounding error-prone variables can be quite different. We also focus onrelationships between a single outcome and covariates, excluding longitudinal data modeling and also survival problemswith time-varying covariates.

In this first part, we provide a description of the most commonly used statistical models for measurement error andmisclassification (Section 2), and the impact of such errors on estimated coefficients in regression models frequently usedin epidemiology (Section 3). We describe ancillary studies needed to provide information about the measurement errormodel and an overview of reference measurements available for some of the most common exposures encountered inepidemiology that are measured with error (Section 4). Study design issues that are impacted by measurement error arediscussed and we outline calculations for determining sample size requirements and power (Section 5). We then presenttwo of the simpler—and more commonly used—methods used to adjust for the bias caused by measurement error inestimated regression coefficients: regression calibration (RC) and simulation extrapolation (SIMEX, Section 6). Availablesoftware for implementing such methods is listed (Section 7). The methods are illustrated by examples from a real study,somewhat simplified so as to retain clarity.

Page 3: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

KEOGH et al. 3

There have been a number of tutorial articles, chapters, and books on measurement error. The book of Carroll et al6

provides a wide-ranging statistical account of issues of measurement error. The book of Buonaccorsi7 provides backgroundand detail on many of the topics that we discuss in this article, including misclassification and measurement error inlinear models, and Gustafson's book8 focuses on Bayesian analysis methods and considers both measurement error andmisclassification. The recent book of Yi9 focuses especially on survival analysis and longitudinal settings, which we donot cover in our two articles. The tutorial article of Keogh and White10 discusses several methods for measurement errorcorrection with a focus on nutritional epidemiology, Armstrong11 provides an overview of the impact of measurementerror in studies of environmental and occupational exposures, and there have been many others focusing on specificaspects. A detailed account of issues surrounding error in covariates is given in a chapter by Buzas et al,12 covering typesof error, study design considerations, and methods of analysis, with an emphasis on likelihood-based methods.

Our two articles provide a comprehensive overview of measurement error issues, from describing types of error and itsimpact to study design and analysis, with an emphasis on providing practical guidance. Specific topics covered in this firstpart that do not appear in the chapter of Buzas et al12 include errors in the outcome variable, the effects of measurementerror on estimating distributions, clarifications concerning the effects of Berkson error in multivariable models, a reviewof reference measurements used in different areas of epidemiology, discussion of sample size for ancillary studies, andan overview of software for RC and SIMEX. In Part 2, we additionally consider a number of advanced topics includingadditional methods to address covariate error, methods to estimate a distribution of an error prone covariate, methods toaddress outcome error, mixtures of misclassification and classical error, mixtures of Berkson and classification error, andapproaches to handle imperfect or a lack of validation data.

2 THE MAIN TYPES OF ERROR

In the statistical and epidemiological literature, one can find two separate terms for errors in variables: measurementerror and misclassification. The former term is typically used for continuous variables, such as dietary intakes, and thelatter term is used for categorical (including discrete) variables such as level of educational attainment. In this section,we describe the types of error that occur in continuous and categorical variables.

Suppose that we are interested in learning the regression relationship between a scalar outcome variable Y and vari-ables X and Z. We will call the latter, X and Z, covariates. They could both be vectors, but for simplicity we start withX being scalar. We suppose that whereas the Z variables are measured exactly, X is measured with error, with the truevalue of X being unobserved. We denote the error-prone observed variable by X* (although sometimes it is denoted by Win the statistical literature). To understand and measure the impact of the mismeasurement of X , we have to know therelationship of the observed X* to the unobserved X . This relationship is specified in a statistical model.

2.1 Measurement error in covariates (continuous variables)

There are several sources of measurement error in continuous variables. One is instrument error, arising due to limita-tions of the instruments used to measure the exposure. Another is error due to self-reporting. It is also common that theexposure of interest is an underlying “usual level,” or average level over a defined period, of a quantity that fluctuatesalmost continually over time. This applies particularly when the exposure is a biological measurement (such as weightor blood pressure). In this case the true exposure may never be observed and we discuss this setting further in Section 4.2where issues of study design, including reproducibility, are considered.

When X and X* are continuous, their relationship is defined by a measurement error model. The simplest case is knownas the classical measurement error model13 and is defined by:

X∗ = X + U, (1)

where U is a random variable with mean 0, and is independent of X . Classical errors have been assumed quite frequently,although not universally, in laboratory and objective clinical measurements, for example, when modeling the relationshipbetween serum cholesterol14 or blood pressure and heart disease.15 An extension of this model that is more suitable forsome measurements, particularly self-reports, is the linear measurement error model,13 in which

X∗ = 𝛼0 + 𝛼X X + U. (2)

Page 4: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

4 KEOGH et al.

This model describes a situation where the observed measurement includes both random error (U, with mean 0 andindependent of X) and systematic error, allowing the latter to depend on the true value X . Classical error is included asa special case of this more general model (2), occurring when 𝛼0 = 0 and 𝛼X = 1. In model (2), 𝛼0 can be said to quantifylocation bias—bias independent of the value of X , and 𝛼X quantifies the scale bias—bias that depends proportionally onthe value of X . The linear measurement error model has been used widely to describe the error in self-reports of dietaryintake (eg, Freedman et al16), and in some versions the parameter 𝛼0 has been specified as a random variable that variesacross individuals.17 Further extensions of the linear measurement error model allow X* to depend also on other variablesZ̃ that may, or may not, be the same as the exactly measured variables Z in the outcome model.18 Another extension is toallow the variance of U to depend on X (eg, Spiegelman et al19).

Of course, the relationship between X* and X may, in practice, be of other forms than the linear model (2). There isa large literature on such models (see, for example, Chen et al20), but relatively few applications of them in epidemiol-ogy, where the most common approach has been to use transformation of the variables to recover, at least approximately,the linearity of the relationship. The most common transformation employed is the logarithmic21 but power transfor-mations have also been used.22 Assuming the classical error on the log scale, that is, logX* = logX +U, corresponds tomultiplicative error.

In models (1) and (2), the measurement X* is viewed as arising from the true value X together with a random errorterm U that is independent of X . Such a model is suitable for many measurements that are employed in epidemiology.However, in some circumstances, it is instead appropriate to view the true value X as arising from the measured value X*

together with an error U that is independent of X*. This occurs, for example, when all the individuals in specific subgroupsare assigned the average value of their subgroup. In that case the error model should be written:

X = X∗ + U, (3)

where U has mean 0 and is independent of X*. This is known as the Berkson error model.23 It occurs frequently in occu-pational epidemiology (eg, Oraby et al24) and in air pollution studies (eg, Goldman et al25). For example, in a study ofsecond-hand smoke exposure, the exposure of workers to second-hand smoke in different factories may be assigned to bethe average level of airborne nicotine obtained from monitoring devices placed in each factory. In air pollution studies,the exposure to certain particles of individuals living in different geographical areas may be assigned as the value from afixed local air pollution monitor. Berkson error also arises when scores are assigned to individuals on the basis of a pre-diction or calibration equation, which is often not appreciated (eg, Tooze et al26). Suppose that the observed exposure isobtained as the expected outcome (fitted value) from a prediction model for the true exposure based on predictor vari-ables C, X = 𝜃0 + 𝜃⊤C C + 𝜖, giving X∗ = 𝜃0 + 𝜃⊤C C. In the event that the prediction model parameters are based on a largesample, it follows approximately that X = X* + 𝜖, that is, the scores from the prediction model are subject to Berkson error.Tooze et al26 give an example where the exposure of interest is basal energy expenditure, and the observed value is anestimate obtained using a previously-derived prediction equation based on an individual's age, sex, height, and weight.Variables obtained on the basis of prediction or calibration equations should therefore be handled accordingly in analysesthat incorporate them. See Section 3 for a discussion in the context of outcome variables measured with error.

The Berkson error model as defined in (3) is additive. However, as for the classical error model, a multiplicative form ofthe Berkson error model may also be considered, meaning that the additive form holds for the log-transformed variables:logX = logX* +U.

When X*and U are normally distributed, then the Berkson error model can be re-expressed as a special case of thelinear measurement error model (2), with 𝛼X = var(X*)/(var(X*)+ var(U)) and 𝛼0 =E(X*)(1− 𝛼X ). In the same vein, underthese same circumstances, the linear measurement error model (2) can be re-expressed as a linear regression of X on X*.

An important issue is whether or not the measurement error U contains any extra information about the outcome Y .When the error contains no extra information about Y , we call the error nondifferential. We express this concept formally,by saying the error is nondifferential with respect to outcome Y if the distribution of Y conditional on X , X*, and Z (denotedp(Y ∣X , X*, Z)) is equal to p(Y ∣X , Z).6(p36) Otherwise the error is differential with respect to Y . Nondifferential error withrespect to Y can also be expressed in terms of the conditional distribution of X*, p(X*| X , Y , Z) = p(X* ∣X , Z), and interms of conditional independence between X* and Y given X and Z, that is, p(X*, Y | X , Z) = p(X* ∣X , Z)p(Y ∣X , Z). Erroris often nondifferential when variables are measured at baseline in a prospective cohort study, since the measurementprecedes the outcome often by a lengthy period (eg, Willett27(p8)). Differential error can occur in case-control studies,when measurements may take place after the outcome has occurred. One form of differential error is the well-knownproblem of recall bias that can occur in case-control studies (eg, Willett27(p7)). Here, the errors in responses to questions,

Page 5: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

KEOGH et al. 5

such as “how many cigarettes did you smoke per week?,” are dependent on whether the person is a case or a control (theoutcome), for example, a person who has or has not been diagnosed with cancer. In general, in analysis, the effects ofnondifferential error are easier to correct than the effects of differential error.

Error can also be nondifferential with respect to both Y and Z, whereby X* is known to be conditionally independentof (Y , Z) given X . This is a stronger form of nondifferentiality than the definition given in the previous paragraph. Infact, this stronger form holds for models (1)-(3), if the random term U is independent of variables Z as well as beingindependent of Y and X (models (1) and (2)) or X* (model (3)).

2.2 Misclassification in covariates (categorical variables)

As in Section 2.1, we are interested in the regression relationship between Y and covariates (X , Z), when the availabledata are Y , X*, and Z. Now, however, we take up the case where X and X* are both categorical variables. For instance,in an observational, questionnaire-based study, some participants may, wittingly or unwittingly, check the wrong box ona yes/no item. As mentioned earlier, when X is categorical the discrepancy between X and X* is typically referred to asmisclassification rather than measurement error.

The simplest case to start with is that in which X and X* are both binary, and the misclassification is nondifferen-tial. For current purposes we assume the stronger form of nondifferentiality, which is tantamount to assuming that the“noise” transforming the latent X into the observable X* is ignorant of the values of Y and Z. Then, mimicking the linearmeasurement error model of Equation (2), we can view the regression relationship

E(X∗|X ,Z,Y ) = 𝛼0 + 𝛼X X , (4)

as describing the extent of the misclassification. This serves to remind that misclassification is closely related to measure-ment error. However, unlike the situation with measurement error, (𝛼0, 𝛼x) are necessarily constrained to ensure that (4)remains between zero and one. In fact, it is more common and intuitive to see nondifferential misclassification expressedin terms of the sensitivity, Sn = Pr(X* = 1| X = 1, Z, Y ) = Pr(X* = 1| X = 1) = 𝛼0 + 𝛼X , and specificity, Sp = Pr(X* = 0| X = 0,Z, Y ) = Pr(X* = 0| X = 0) = 1− 𝛼0. See Gustafson8(sec.3.1) or Buonaccorsi7(sec.2.3) for a more detailed description of thisframework.

Following from this, when choosing a model to represent differential misclassification, adding terms involving Yand/or Z to the right-hand side of (4) is unappealing, since a complicated constrained parameter space ensues. Rather,using an appropriate link function is recommended, for example the logit link function.

In principle, the idea of Berkson error also carries over to the misclassification setting. That is, there could be situationswhere the conditional distribution of X given X* is the fundamental descriptor of the misclassification. This is seen lessoften in practical applications, but could arise in an occupational hygiene context, where group-based exposure assessmentis used (see, for example, Tielemans et al28). For instance, all workers in a group, based on a common work location, areassigned the same exposure status, either X* = 0 for unexposed or X* = 1 for exposed; but misclassification arises, becausewithin that group some workers' true exposure X will differ from the assigned value X*. Berkson misclassification alsooccurs with pooled samples in the diagnostic setting (see, for example, Peters et al29).

Regardless of the type of misclassification that can reasonably be assumed, when X and X* are both binary, the dis-tribution of X given X* is usually referred to in terms of predictive values (or reclassification probabilities), namely thepositive predictive value, PPV = Pr(X = 1| X* = 1) and the negative predictive value NPV = Pr(X = 0| X* = 0). In the typ-ical non-Berkson situation, Sn and Sp are taken as characterizing the misclassification, with PPV and NPV then beingdetermined by Sn, Sp, and the prevalence of X , Pr(X = 1). In the atypical Berkson situation, PPV and NPV would char-acterize the misclassification, with Sn and Sp then being determined by PPV, NPV, and the prevalence of X*, Pr(X* = 1).See Buonaccorsi7(sec.2.3) for more details.

An interesting form of misclassification arises when the binary X* is created by thresholding a continuous variablemeasured with error, in which case X would correspond to thresholding the continuous variable measured without error.That is, say X = I{S> c} and X* = I{S* > c}, where S* is a nondifferential (with respect to Y and Z) error-prone measurementof a continuous variable S. In a somewhat counterintuitive finding, Flegal et al30 demonstrated that even if the contin-uous measurement S* has nondifferential error, the binary measure X* may have differential misclassification (see alsoWacholder et al31). That is, conditional independence of S*and (Y , Z) given S does not imply conditional independenceof X* and (Y , Z) given X .

Page 6: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

6 KEOGH et al.

Intermediate between a binary X and a continuous X is the case of a categorical X with more than two categories.An obvious example in epidemiology is where X represents smoking status, with three options: never, former, or cur-rent smoker. In this situation, sensitivity and specificity are subsumed by a matrix of classification probabilities, that is,pij = Pr(X* = j| X = i), for i, j = 1, … , k, where k is the number of categories. Work addressing this case includes that ofBrenner32 and Wang and Gustafson.33

While it has not commonly arisen, one need not limit misclassification to “square” situations, in which X and X* sharea common number (and labeling and interpretation) of categories. For example, 2 by 3 misclassification could arise ifexposure X is truly binary (“exposed” or “unexposed”), but the exposure assessment by an expert (or consensus of experts)classifies each subject as “likely unexposed,” “perhaps exposed” and “likely exposed.” A 2× 3 matrix of classificationprobabilities would then describe the extent of misclassification. Wang et al34 considered this setting.

2.3 Measurement error and misclassification in an outcome variable

In previous sections, we have discussed error that occurs in measuring covariates X . However, it is also possible that theoutcome Y is measured with error, through the error-prone observation Y *. As we will see in Section 3, the effects ofmeasurement error in an outcome variable are different from those in a covariate. The effects of measurement error in anoutcome variable have tended to be under-studied in the past relative to error in covariates, but there is now a growingrecognition of their potential impact. As we shall see in Section 3.3, particularly important are the effects of differentialerror in Y , but here the definition of differential error changes. Differential error in Y occurs when Y * is dependent on Xconditional on Y (or on Y and Z), that is, p(Y *| Y , X)≠ p(Y *| Y ) or p(Y *| Y , X , Z)≠ p(Y *| Y , Z).

Applications where a binary or categorical outcome variable is subject to misclassification have also been discussedin the literature (eg, McInturff et al,35 Lyles et al36). This is somewhat more nuanced than the corresponding situation formeasurement error in a continuous outcome variable, as we discuss in Section 3.3.

3 EFFECTS OF MEASUREMENT ERROR AND MISCLASSIFICATION ONSTUDY RESULTS

In this section, we focus on the impact of measurement error in or misclassification of a variable on the results of a study.We consider studies where the main analysis is based upon a model relating an outcome Y to a covariate X via a regressionmodel. We deal first with error in a continuous X , then with misclassification in a categorical X , and finally with errorin Y . In the following section on the effects of error in continuous covariates, we build up from the simplest case of asingle covariate, measured with error, to the case with additional exactly measured covariates, and finally to the case ofmultiple error-prone covariates. The same is done in Section 3.2, which refers to misclassified covariates. It should comeas no surprise that the effects of error are different for the different types of error that occur, so that this section relieson information we have already presented in Section 2. While it is a common misconception that measurement error incovariates merely leads to attenuation of effect estimates (and is thus considered less of a concern by some), we shall seethat this is true only in certain special cases.

3.1 Effects of measurement error in a continuous covariate

3.1.1 Single covariate regression

Suppose that our analysis of the relationship between a continuous outcome Y and covariate X is based on a linearregression model

E(Y |X) = 𝛽0 + 𝛽X X . (5)

However, because of measurement problems we use X* instead of X and therefore explore the linear regression

E(Y |X∗) = 𝛽∗0 + 𝛽∗X X∗. (6)

Page 7: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

KEOGH et al. 7

F I G U R E 1 Simulated data on 20 individuals showing the effects of classical error and Berkson error in the continuous covariate X onthe fitted regression line. For both plots Y was generated from a normal distribution with mean 𝛽0 + 𝛽X X (using 𝛽0 = 0, 𝛽X = 1) and variance1. Classical error plot: X was generated from a normal distribution with mean 0 and variance 1. X* was generated using X* = X +U. Thedifference in the slopes in this graph is due to attenuation from the measurement error in X*. Berkson error plot: X* was generated from anormal distribution with mean 0 and variance 1 and X was generated from the normal distribution implied by the Berkson error modelX = X* +U. For both error types var(U) = 3. The small difference in the slopes in this graph is due entirely to sampling error

In assessing the impact of the measurement error on the regression results we will be interested mainly in:

1. whether and how 𝛽∗X is different from 𝛽X , and2. whether the precision with which we estimate 𝛽∗X is different from the precision with which we estimate 𝛽X , and3. whether the usual statistical test of the hypothesis that 𝛽∗X = 0 is or is not a valid test (ie, preserves the nominal

significance level) of the hypothesis that 𝛽X = 0.

When measurement error is classical and nondifferential (model (1)), then |𝛽∗X | ≤∣ 𝛽X ∣, with equality occurring onlywhen 𝛽X = 0. The measurement error in X* attenuates the estimated coefficient, and any relationship with Y appearsless strong. More precisely we can write: 𝛽∗X = cov(Y , X∗)

var(X∗)= cov(Y , X+U)

var(X+U)= cov(Y , X)

var(X)+var(U)= var(X)

var(X)+var(U)cov(Y , X)

var(X)= 𝜆𝛽X , where 𝜆 =

var(X)var(X)+var(U)

lies between 0 and 1 (0<𝜆≤ 1) and is called the attenuation factor,6(p. 43) or by some the regression dilutionfactor.37 See the graph on the left-hand side in Figure 1. Clearly, the larger is the measurement error (var(U)), the smalleris the attenuation factor, and the greater is the attenuation. Besides attenuating the estimated coefficient relating Y to X ,classical measurement error also makes the estimate less precise relative to its expected value. In other words, the ratioof the expected value of the estimated coefficient to its standard error (SE) is smaller than under circumstances where Xis measured without error, that is, E(𝛽∗X )∕SE(𝛽∗X )< E(𝛽X )∕SE(𝛽X ), and therefore the statistical power to detect whether itis different from zero is lower. Approximately, the effective sample size is reduced by the squared correlation coefficientbetween X* and X , 𝜌2

XX∗ , which for this model happens to be equal to the attenuation factor 𝜆.12,38 When measurementerror is substantial (eg, 𝜆< 0.5), its effects on the results of research studies can be profound, with key relationships beingmuch more difficult to detect.

While measurement error in this single covariate setting results in bias and loss of power, any test of the null hypothesisthat 𝛽∗X = 0 is a valid test of the hypothesis that 𝛽X = 0, and this is because the relationship 𝛽∗X = 𝜆𝛽X means that 𝛽∗X equals0 if and only if 𝛽X equals 0.

When the error in X* conforms to the linear measurement error model (2), the relationship 𝛽∗X = 𝜆𝛽X still holds12 but𝜆 need no longer lie between 0 and 1, since now

𝜆 = 𝛼X var(X)𝛼2

X var(X) + var(U). (7)

Page 8: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

8 KEOGH et al.

This means that under the linear measurement error model the effect of the measurement error is no longer nec-essarily an attenuation. Nevertheless, in nearly all applications 𝛼X is positive, so that negative values of 𝜆 are virtuallyunknown, and in many applications, var(U) is sufficiently large to render 𝜆 less than 1, even when 𝛼X is less than 1.39

Regardless of the value of 𝛼X , statistical power to detect the relationship between X and Y is reduced and the effectivesample size is reduced by a factor approximately equal to 𝜌2

𝑋𝑋∗ . As with classical measurement error, the test of the nullhypothesis that 𝛽∗X = 0 is a valid test of the hypothesis that 𝛽X = 0. Note that the expression for 𝜆 in (7) reverts to the formfor classical error when 𝛼X = 1.

When the error in X* has the form of the classical (1) or linear measurement error model (2), but the error is differen-tial, then the above results no longer hold, and an additional contribution to the bias occurs in the estimated coefficientdue to the covariance of outcome Y with error U. In particular, the relationship 𝛽∗X = 𝜆𝛽X no longer holds, so that statisticaltests of the null hypothesis that 𝛽∗X = 0 generally are not valid tests of the hypothesis that 𝛽X = 0.

When the error in X* is Berkson (model (3)) and nondifferential, the effects on estimation are very different from thosedescribed above. In fact, there is no bias, that is 𝛽∗X = 𝛽X !11 See the graph on the right-hand side in Figure 1. However, aswith the classical and linear error models, statistical power is reduced, and the effective sample size is again reduced bythe factor 𝜌2

XX∗ .The results in this section relating to the properties of 𝛽∗X are based on assuming the linear model for Y given X in (5)

and a linear relation between X and X*. The linear relation between X and X* is likely to be appropriate for many practi-cal purposes including when X and X* are jointly normal. If the relation between X and X* is nonlinear, then the linearmodel for Y given X no longer implies a linear model for Y given X* or vice-versa. However, transformations of X* (andX) can often lead to approximate linearity and the above results could then be taken as good working approximations. Ifthe linear model for Y given X is misspecified then the above expressions for the attenuation factor 𝜆 hold asymptotically.If investigators are particularly concerned that these linearity assumptions do not hold, they may try to find a transfor-mation of the X scale on which the assumptions are more tenable, or pursue more advanced methods to accommodatenonlinearity.

3.1.2 Regression with a single error-prone covariate and other exactly measuredcovariates

Most analytic epidemiological studies involve regression models with several covariates. Suppose we wish to relate out-come Y not only to X but also simultaneously to one or more exactly measured covariates, for example confounders, Z,so that

E(Y |X ,Z) = 𝛽0 + 𝛽X X + 𝛽ZZ, (8)

where Z may be scalar or vector. As before, because of measurement problems we use X* instead of X , and thereforeexplore the linear regression

E(Y |X∗,Z) = 𝛽∗0 + 𝛽∗X X∗ + 𝛽∗ZZ. (9)

Results concerning 𝛽∗X are similar to those in Section 3.1.1. When the error in X* conforms to a linear measurementerror model, the relationship 𝛽∗X = 𝜆𝛽X still holds but now 𝜆 = 𝛼X∣Zvar(X|Z)

𝛼2X∣Zvar(X|Z)+var(U)

,6 where 𝛼X ∣Z is the coefficient of X in

a linear measurement error model for X* that includes the variables Z as other covariates. This expression is a simpleextension of the formula given for classical measurement error by Carroll et al.6(eq.3.10) Because 𝛽∗X = 𝜆𝛽X , any test of thenull hypothesis that 𝛽∗X = 0 is a valid test of the hypothesis that 𝛽X = 0. Also similar to previous results, statistical powerto detect the relationship between X and Y is reduced, but now the effective sample size is reduced by a factor equal to𝜌2

XX∗∣Z, where 𝜌XX∗∣Z is the partial correlation of X* with X conditional on Z.Note also that in general the coefficients for Z, 𝛽∗Z, are not equal to 𝛽Z, so that estimates of these coefficients from

the model with X* substituted for X will also be biased. This bias will occur in the case of each Z-variable, unless theZ-variable is independent of X conditional on the other Z-variables or 𝛽X = 0.6(sec.3.3) Moreover, due to the form of thebias, any test of the null hypothesis that 𝛽∗Z = 0 is an invalid test of the hypothesis that 𝛽Z = 0. These results highlight theimpact of measurement error in covariates that are not the main exposure of interest on the results from an analysis, evenwhen the main exposure is measured without error.

Page 9: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

KEOGH et al. 9

With Berkson error, special conditions are required for 𝛽∗X to equal 𝛽X and for 𝛽∗Z to equal 𝛽Z, namely that the errorinvolved in X*, U (see model (3)), is independent both of Z and of the residual error in regression model (8). Independenceof the residual error is the equivalent of the nondifferential error assumption with respect to the outcome Y (p(X*| X , Y ,Z) = p(X*| X , Z)). However, independence of U from Z is not guaranteed, and may even be uncommon. Thus, contraryto general perception, Berkson error in a covariate can indeed cause bias in the conventional estimate of the regressioncoefficients in multiple regression problems, even when the error is nondifferential. The bias caused by Berkson error thatis nondifferential but correlated with Z is a multiplicative one, like the bias caused by nondifferential classical or linearmeasurement error. Thus the usual test of the null hypothesis remains valid. Except in some special cases, correlation ofthe Berkson error with Z also causes bias in the usual estimate of 𝛽Z, but in this case the bias is additive, so the usualtest of the null hypothesis is invalid. The special cases where there is no bias occur when X is independent of a given Zconditional on the other Zs, or when 𝛽X is zero.

3.1.3 Regression with multiple error-prone covariates

Often, particularly in nutritional epidemiology, we wish to relate the outcome Y to two or more variables that are eachmeasured with error. For example, in the case of two such variables, the model will be

E(Y |X1,X2) = 𝛽0 + 𝛽X1X1 + 𝛽X2X2. (10)

Because of measurement problems we observe X∗1 instead of X1, and X∗

2 instead of X2, and therefore fit the linearregression model

E(Y |X∗1 ,X∗

2 ) = 𝛽∗0 + 𝛽∗X1X∗1 + 𝛽∗X2X∗

2 . (11)

Results concerning the vectors of coefficients 𝛽X = (𝛽X1, 𝛽X2)T and 𝛽∗X = (𝛽∗X1, 𝛽∗X2)

T are different from those in theearlier sections. When the errors in the X* variables conform to the classical measurement error model, their relationshipmay still be written in the form 𝛽∗X = Λ𝛽X but now Λ = cov(X +U)−1cov(X), where cov( ) is a 2-by-2 variance-covariancematrix and X and U are vectors (X1, X2)T and (U1, U2)T , the latter denoting the errors in X∗

1 and X∗2 , respectively. Writing

out this relationship fully we obtain,

𝛽∗X1 = Λ11𝛽X1 + Λ12𝛽X2 (12)

𝛽∗X2 = Λ21𝛽X1 + Λ22𝛽X2,

where Λij (i, j = 1, 2) denotes the (i, j)th element of the 2-by-2 matrix Λ. Thus, the simple proportional relationshipbetween 𝛽∗X1 and 𝛽X1 (or between 𝛽∗X2 and 𝛽X2) seen in earlier sections no longer holds. The diagonal terms of theΛmatrix,Λ11 and Λ22 are still likely to lie between 0 and 1 in most applications, so that, for example, 𝛽∗X1 will usually contain anattenuated contribution from the true coefficient of X1 (Λ11𝛽X1), but 𝛽∗X1 will also be affected by “residual confounding”from the mismeasured X2 (Λ12𝛽X2). Thus, the estimated coefficients in model (11) may be larger or smaller than the truetarget values in a rather unpredictable manner. Furthermore, any test of the null hypothesis that 𝛽∗X1 = 0 in general willno longer be a valid test of the hypothesis that 𝛽X1 = 0, and similarly for the test of 𝛽X2 = 0. As we will see in Section 6,in these circumstances, valid inference for 𝛽X1 (say) requires knowledge about or estimation of the parameters of themeasurement error models for both X∗

1 and X∗2 .

When the errors in the X* variables conform to the linear measurement error model, the formulas are similar butslightly more complex, and the concerns over biased estimation and hypothesis testing are identical to those describedabove for multiple covariates having classical error.

If both covariates are subject to Berkson error then, as in Section 3.1.2, the estimated coefficients are unbiased onlyin special circumstances. Assuming the errors are nondifferential with respect to outcome Y , one also requires that theBerkson error in X∗

1 is independent of X∗2 , and the Berkson error of X∗

2 is independent of X∗1 . When bias occurs, it is

accompanied by the nonvalidity of the conventional null hypothesis tests that the regression coefficients are zero, asexplained in Section 3.1.2.

Page 10: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

10 KEOGH et al.

3.1.4 Common nonlinear regression models

While the results described in Sections 3.1.1-3.1.3 are derived for linear regression models, they serve as good approxima-tions in many circumstances for other regression models that specify a linear predictor function, for example, generalizedlinear models:

h(E(Y |X ,Z)) = 𝛽0 + 𝛽X X + 𝛽ZZ. (13)

The approximation is usually good if the measurement error is small or the magnitude of 𝛽X remains small to mod-erate, although the definition of “small to moderate” depends on the form of the model. Details for the commonly usedlogistic regression model, among others, are given by Carroll et al.6(sec.4.8)

For the Cox proportional hazards model, the relation between 𝛽∗X and 𝛽X (now log hazard ratios) is complicated bythe longitudinal nature of the analysis, and by the changing set of individuals remaining at risk.40 However, sometimes inepidemiological problems, event rates are very low and drop-outs are entirely random, so that the covariate distributionof individuals at risk remains stable throughout the follow-up. In such circumstances the results for linear regressionmodels above would provide good approximations.41 Carroll et al6 discuss the impact of measurement error in nonlinearregression models and provide expressions for bias based on higher-order approximations.

3.2 Effects of misclassification in a binary covariate

3.2.1 Single covariate regression

Having considered continuous covariates measured with error, we now turn to the case of binary or categorical covariatesthat are misclassified. We start with a continuous outcome Y and a single binary covariate X that is misclassified asbinary X*, used in models described by Equations (5) and (6). In the situation of a binary covariate, the interpretationof the coefficient 𝛽X is as a difference in the mean outcome between those with X = 1 and those with X = 0. Assumingnondifferential misclassification, as described by sensitivity Sn and specificity Sp, 𝛽∗X is attenuated. The attenuation factor𝜆 = 𝛽∗X∕𝛽X is determined by Sn, Sp, and Pr(X = 1). For examples, see Figure 2. Given that the positive predictive value(PPV) and negative predictive value (NPV) for X* as an error-prone measurement of X are themselves determined by Sn,Sp, and Pr(X = 1), Gustafson8(sec.3.1) argues that the attenuation factor is most intuitively expressed as

𝜆 = NPV + PPV − 1. (14)

This gives a direct expression that shows how a more error prone measurement of X yields more attenuation inestimating the coefficient for X .

Aside from the particular form of the attenuation factor, the other messages from Section 3.1.1 remain unchanged.Testing the null hypothesis that 𝛽∗X = 0 is a valid test of the hypothesis that 𝛽X = 0. However, the test based on (Y , X*) datahas lower power than the ideal test based on (Y , X) data, since the association between Y and X* is necessarily weakerthan the association between Y and X .38,42

Returning to the magnitude of attenuation, Equation (14) also permits identification of problematic situations. Forexample, say Pr(X = 1) is close to zero (a “rare exposure”). Then, since PPV = {1+ (Pr(X = 0)/Pr(X = 1))((1− Sp)/Sn)}−1,even a relatively high specificity can produce a very low PPV. For instance, if Pr(X = 1) = 0.01, a specificity of 0.9nevertheless leads to a PPV less than 0.1. In turn this produces massive attenuation.8(sec.3.1)

3.2.2 Regression with a single misclassified covariate and other exactly measuredcovariates

In Section 3.1.2, we considered using, by necessity, the linear regression of Y on X* and Z when really wanting to regressY on X and Z. When Z is scalar (of any type), X is binary and the misclassification is nondifferential (with respect to Yand Z), an expression for the attenuation factor 𝜆 = 𝛽∗X∕𝛽X is given in Section 3.2 of Gustafson's book.8 The result doesnot depend on assuming that the linear model for Y given (X , Z) is correctly specified. Rather, the attenuation factor is

Page 11: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

KEOGH et al. 11

F I G U R E 2 Effects of nondifferential misclassification in binary Xon the regression coefficient in a linear regression of continuous Y on X .𝛽X is the regression coefficient in a regression of Y on X and 𝛽∗X is theregression coefficient in a regression of Y on X* (misclassified X). Theattenuation factor 𝜆 (Equation (14)) is a function of Pr(X = 1) and thesensitivity (Sn) and specificity (Sp) of X*. The thick line is the line 𝛽∗X=𝛽X

defined as the ratio of large-sample limits for the estimated regression coefficients. The expression for the attenuationfactor is unwieldy and not reproduced here. Some of its properties, however, are intuitively helpful. In particular, whenX and Z are uncorrelated, the attenuation reduces to Equation (14). Also, with all other aspects of the problem fixed, theattenuation factor decreases as the magnitude of the correlation between X and Z increases. This permits an expansionof the list of problematic situations. We have already mentioned as problematic modest misclassification (and imperfectspecificity particularly) of a rare exposure, but if that rare exposure X is strongly associated with a precisely measuredcovariate Z, then even stronger attenuation occurs.

Again, the main message about hypothesis testing is the same as for continuous X (Section 3.1.2). Assuming nondif-ferential misclassification, Y and X* will be associated given Z if and only if Y and X are associated given Z. Hence a testfor 𝛽∗X = 0 using available data will be valid as a test for 𝛽X = 0, but will have less power than could be achieved were (Y ,X , Z) data available.

Also in line with the case of continuous X , the misclassification of X implies that the coefficients of Z estimated fromregressing Y on X* and Z are biased for the coefficients of Z in the Y given X and Z model. When Z is scalar, an explicitexpression for this bias is given in Gustafson's book.8(sec.3.2)

3.2.3 Regression with multiple misclassified covariates

Situations where two or more categorical covariates are subject to misclassification have not received very much attention,either theoretically or in practice. The added complexity discussed for the continuous case in Section 3.1.3 applies hereas well. Even if the model of Equation (10) holds for X1 and X2 that are both binary, and even if the misclassificationmechanism is simple, unwieldy forms for Equation (12) result. It is worth considering what “simple” or “nicely behaved”can mean in the face of two binary covariates subject to misclassification. We could assume, for example that as a pair(X∗

1 ,X∗2 ) have nondifferential misclassification, and we could further assume “independent errors,” that is, conditional

independence of X∗1 and X∗

2 given (X1, X2). Under these assumptions, and for given sensitivity and specificity of eacherror-prone measurement, it is easy to determine E(Y |X∗

1 ,X∗2 ) from E(Y | X1, X2). However, we do not obtain simple and

interpretable expressions. In particular, and in line with Section 3.1.3, the X∗1 coefficient will be a sum of a term involving

𝛽X1 and a term involving 𝛽X2. A simple attenuation structure does not emerge.

3.2.4 Other common situations

What we know about the impact of a misclassified covariate in a linear model for a continuous outcome carries overapproximately, but not exactly, to generalized linear models for other types of outcomes. Closed-form expressions are

Page 12: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

12 KEOGH et al.

elusive here. For example, Gustafson8(sec.3.4) gives a numerical algorithm to determine the large-sample limits of logisticregression of a binary Y on X* and Z, when the logistic regression of Y on X and Z is of interest. For fixed 𝛽0 and 𝛽Z, 𝛽∗Xis seen to vary almost linearly with 𝛽X , and this relationship varies only slightly with 𝛽0 and 𝛽Z. As with so many otherstatistical concepts, what holds exactly in the linear model holds approximately in the generalized linear model.

However, such intuitions about attenuation do not extend to misclassification of a categorical covariate having morethan two categories. Recall that a matrix of misclassification probabilities governs such misclassification, with entriespij = Pr(X* = j| X = i). Even given nondifferential misclassification, it is straightforward to construct a plausible misclassi-fication matrix for which E(Y | X*) has a different pattern than E(Y | X), in which attenuation does not occur. For example,suppose X is ordinal. Then, in comparing levels X = a and X = a+ 1, E(Y | X* = a+ 1)−E(Y | X* = a) can be larger inmagnitude than E(Y | X = a+ 1)−E(Y | X = a). Some work that investigates the polychotomous case includes Dosemeciet al43 and Weinburg et al,44 but our emphasis here is really on the irregularity of the impact of misclassification.

3.3 Effects of measurement error in an outcome variable

In Sections 3.1 and 3.2, we have focused on the effects of error in covariates. We consider now the effects of measurementerror in an outcome variable, Y . Recall that the error prone version of Y is denoted Y *. We assume that covariates aremeasured without error and, for simplicity, we focus on a single covariate X , though the results extend easily to multiplecovariates.

3.3.1 Continuous outcomes

Suppose that our analysis is based on the linear regression model

E(Y |X) = 𝛽0 + 𝛽X X . (15)

Because of measurement error in Y , we instead use the linear regression model

E(Y∗|X) = 𝛽∗0 + 𝛽∗X X . (16)

As in the case of measurement error in a covariate, our interest is in whether and how 𝛽∗X is different from 𝛽X , whetherthe precision with which we estimate 𝛽∗X is different from the precision with which we estimate 𝛽X , and whether the usualstatistical test of the hypothesis that 𝛽∗X = 0 is a valid test of the hypothesis that 𝛽X = 0.

Under the classical error model for Y *, that is Y * = Y +U (as in Equation (1)), Y * is an unbiased measure of Y . Hencethe expectation of Y * is equal to the expectation of Y ; E(Y *) = E(Y ). The same holds if we condition on any covariates X ;E(Y *| X) = E(Y | X). Therefore, a regression of Y * on X yields unbiased estimates of 𝛽0 and 𝛽X , in other words 𝛽∗0 = 𝛽0 and𝛽∗X = 𝛽X . See the graph on the left-hand side of Figure 3. This contrasts with the attenuation effect of classical measurementerror in a single covariate X . It follows also that a test of the hypothesis that 𝛽∗X = 0 is a valid test of the hypothesis that𝛽X = 0. Although the regression coefficients are not affected by replacing Y by the error prone measure Y * when theerror is classical, the fitted line from the regression of Y * on X will have greater uncertainty than that from a regressionof Y on X , and the precision with which 𝛽∗X is estimated using Y * is lower than that with which 𝛽X is estimated using Y .Consequently, the power to detect an association between X and the outcome is lower when using Y * than when using Y .One way of understanding this is to note that var(Y *| X) = var(Y | X)+ var(U). The additional variability in Y * comparedwith Y is absorbed into the residual variance in the regression of Y * on X , and the variance of the estimator 𝛽∗X is a functionof the residual variance.

Suppose now, instead, that the error in Y * takes the linear measurement error form

Y∗ = 𝛼0 + 𝛼Y Y + U, (17)

where U is a random variable with mean 0, constant variance, and independent of Y , as in Equation (2). Under this modelwe have E(Y *) = 𝛼0 + 𝛼Y E(Y ). It follows, from Equation (15), that E(Y *| X) = (𝛼0 + 𝛽0𝛼Y )+ 𝛼Y𝛽X X . Measurement error ofthis form therefore results in biased estimates of the association between X and the outcome.

Page 13: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

KEOGH et al. 13

F I G U R E 3 Simulated data on 20 individuals showing the effects of classical error and Berkson error in continuous Y on the fittedregression line. For both plots X was generated from a normal distribution with mean 0, variance 1 and the errors U were generated from anormal distribution with mean 0 and variance 3. Classical error plot: Y was generated with mean X and variance 1. Y * was generated usingY * = Y +U. The difference in slopes is due entirely to sampling error. Berkson error plot: Y * was generated with mean X and variance 1. Thedifference in slopes is due to attenuation from the measurement error in Y . Y given X was generated from the normal distribution implied bythe model for Y * and the Berkson error model Y = Y * +U

In the measurement error models for Y * considered above, the error is nondifferential with respect to X . One exampleof when differential measurement error in an outcome could arise is in a randomized study of two treatments (X), in whichthe nature of the treatments results in differential reporting of the outcome in the two treatment groups. Differential errorin Y * may take the simple classical form, as in Equation (1), but with different error variances, var(U), in the two treat-ment groups. This does not result in bias in the estimate of 𝛽∗X . However, it does result in heteroscedasticity in the residualvariance. More usually, differential measurement error in an outcome could take the form of different degrees of system-atic error: Y * = 𝛼0X + 𝛼YX Y +U for two groups X = 0 and 1. The effect of this is that an estimate of 𝛽∗X is a biased estimateof 𝛽X . The bias may be either towards or away from the null value 0, depending on the form of the differential error.45

Outcome variables may also be subject to Berkson error, though this is perhaps less common than Berkson error incovariates. We explained in Section 2.1 how Berkson error arises in variables that are derived as the result of a predictionor calibration equation. Hence Berkson error in an outcome could arise if, instead of observing Y , we observe Y * whichhas been obtained as the fitted value from a prediction model for the outcome Y . The Berkson error model (Equation (3))is Y = Y * +U, where U has mean zero and is independent of Y *. Here we focus on nondifferential Berkson error, meaningthat Y * and X are independent conditional on Y . To understand the effect of nondifferential Berkson error in an outcomevariable, recall that under this error model, and when Y * and U are normally distributed, the measured outcome followsa linear regression model E(Y *| Y ) = 𝛼0 + 𝛼Y Y . Using this result, we can see that the coefficient 𝛽∗X in Equation (16) canbe expressed as: 𝛽∗X = cov(X ,Y∗)

var(Y∗)= 𝛼Y cov(X ,Y )

var(Y∗). The true coefficient of interest from Equation (15) is 𝛽X = cov(X ,Y )

var(Y ). Also using

the result that 𝛼Y = cov(Y ,Y∗)var(Y )

, we have the relation 𝛽∗X = cov(Y ,Y∗)var(Y )

𝛽X = var(Y∗)var(Y )

𝛽X = var(Y∗)var(Y∗)+var(U)

𝛽X . Since the ratio var(Y∗)var(Y∗)+var(U)

lies between 0 and 1, the effect of this type of error is to attenuate the estimated regression coefficient.46 See the graph onthe right-hand side in Figure 3.

Recalling that in a simple linear regression of Y on X , Berkson error in covariate X causes no bias in the estimatedregression coefficient, one sees that the effects of nondifferential classical error and Berkson error in an outcome variableare the reverse of their effects in a covariate. As with differential classical error, differential Berkson error in an outcomevariable may cause over-estimation or under-estimation of 𝛽X .

3.3.2 Binary outcomes

We have seen in Section 3.3.1 that nondifferential and unbiased measurement error (as in model (1)), yielding a sur-rogate Y * for continuous Y , preserves linear model structure, that is, E(Y *| X) = E(Y | X), with the only impact of the

Page 14: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

14 KEOGH et al.

F I G U R E 4 Effects ofnondifferential and differentialmisclassification in Y on the logodds ratio. 𝛽∗X is the log odds ratiofor Y * given X (Equation (20)) and𝛽X is the log odds ratio for Y givenX (Equation (19)). The covariate Xis binary (for simplicity) and weassume 𝛽0 = 0 in Equation (19). Snand Sp denote the nondifferentialsensitivity and specificity for Y *,and Sn(X) and Sp(X) denote thedifferential versions for X = 0, 1.The thick line is the line 𝛽∗X = 𝛽X

measurement error being an increase in residual variance. However, it is quickly apparent that this same relationshipdoes not hold if Y is categorical, as we now show. Misclassification in a binary outcome (Section 2.2) can be expressedin terms of the sensitivity Sn(X) = Pr(Y * = 1 ∣Y = 1, X) and specificity Sp(X) = Pr(Y * = 0 ∣Y = 0, X). The sensitivity andspecificity may be differential, that is, dependent on X , or nondifferential, in which case Sn(X) and SP(X) do not dependon X . Noting that E(Y | X) = Pr(Y = 1| X) and E(Y *| X) = Pr(Y * = 1| X), these probabilities are related using the sensitivityand specificity as

Pr(Y∗ = 1|X) = (1 − Sp(X)) + (Sn(X) + Sp(X) − 1)Pr(Y = 1|X), (18)

and are equal only when Sp(X) and Sn(X) equal 1.The association between a binary outcome and covariate X is typically modeled using a logistic regression model, for

example

logPr(Y = 1|X)Pr(Y = 0|X)

= 𝛽0 + 𝛽X X , (19)

where 𝛽X is the log odds ratio of interest. Using the measured outcome, we would instead fit the model

logPr(Y∗ = 1|X)Pr(Y∗ = 0|X)

= 𝛽∗0 + 𝛽∗X X . (20)

It can be shown that, provided the misclassification in Y is nondifferential with respect to X , meaning that Sn(X) andSp(X) do not depend on X , the impact of the misclassification is that the log odds ratio 𝛽∗X is attenuated relative to 𝛽X .47

However, if the misclassification is differential, that is if the sensitivity or specificity differ for different values of X , theeffect on the log odds ratio can be a bias either away from or towards the null value 0.7(sec.3.4) The effects of nondifferentialand differential misclassification are illustrated in Figure 4.

3.4 Effects of measurement error on estimating the distribution of a variable

In some cases, there is interest in describing what we have termed an “outcome” variable not in relationship to othervariables, but to estimate the distribution of the variable in the population. Examples include estimating the distributionof food intakes and physical activity levels for a population.48,49 As above, we consider the true outcome, Y , to be thevariable that we want to measure, and its measurement, Y *, to be an error-prone version.

Most commonly, we are interested in a continuous measure and assume a classical error model for Y *. As noted inSection 3.3.1, the classical error model leads to E(Y *) = E(Y ), and var(Y *) = var(Y )+ var(U); in other words, the mean ofthe distribution of Y * is unbiased, but the variance of Y * overestimates the variance of Y . Under a Berkson error model,the mean of Y * is again unbiased for the mean of Y , but var(Y *) = var(Y )− var(U) so the variance of Y * underestimates

Page 15: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

KEOGH et al. 15

T A B L E 1 Effects of measurement error according to type of error and target of the analysis

Nondifferential error Differential error

Analysis Target Classical Linear Berkson Any

Single error-pronecovariate regression

Regression coefficient Underestimated Biased in eitherdirection

Sometimesunbiaseda

Biased in eitherdirection

Test of null hypothesis Valid Valid Sometimes valida Invalid

Power Reduced Reduced Reducedb Not applicableb

Regression withmultiple error-pronecovariates

Regression coefficients Biased in eitherdirection

Biased in eitherdirection

Sometimesunbiaseda

Biased in eitherdirection

Tests of null hypothesis Invalid Invalid Sometimes valida Invalid

Power Not applicableb Not applicableb Reducedb Not applicableb

Regression witherror-prone outcomevariable

Regression coefficients Unbiased Biased in eitherdirection

Underestimated Biased in eitherdirection

Tests of null hypothesis Valid Valid Valid Invalid

Power Reduced Reduced Reduced Not applicableb

Distribution with anerror-pronecontinuous variable

Mean Unbiased Biased in eitherdirection

Unbiased —

Lower tail percentilesc Underestimated Biased in eitherdirection

Overestimated —

Upper tail percentilesc Overestimated Biased in eitherdirection

Underestimated —

aUnbiased and valid only when the Berkson error is independent of the other covariates in the model.bThe power of a test is only meaningful when the test of the null hypothesis is valid.cThe percentiles affected depends on the distribution of Y .

the variance of Y .23 For other types of error model, such as the linear measurement error model, the variance of Y * mayvary in either direction from the variance of Y . Thus, the distribution of Y * is generally biased for important features ofthe distribution of Y . In Part 2, Section 3, we discuss methods to estimate the distribution of Y when only an error proneY * can be observed.

3.5 Summary of results in this section

This section has dealt with the effects of measurement error and misclassification on the results of commonly used sta-tistical procedures. To provide an overview, we summarize the main results in a table (Table 1). In Section 6, we willconsider some analysis methods commonly used with continuous variables to mitigate these effects. However, becausethese adjustments usually require data from ancillary studies that investigate the measurement error, we will first considersuch studies in Sections 4 and 5.

4 ANCILLARY STUDIES FOR ASSESSING THE NATURE ANDMAGNITUDE OF MEASUREMENT ERROR

4.1 General principles

To adjust estimates and hypothesis tests for the effects of measurement error, one needs information on the measurementerror model and its parameters. If at the time of designing the study, the measurement error model and its parametersare fully known then one may use the information to form a correct analysis method. In epidemiology, however, there is

Page 16: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

16 KEOGH et al.

often a severe lack of information about measurement errors and ancillary studies are sorely needed for ascertaining thenature and magnitude of the measurement error.

Ancillary studies involve the use of additional measurements alongside the error-prone measurement X* to provideinformation about the measurement error. In practice, the nature of ancillary studies varies according to the type of errorsof measurement expected and the availability of more accurate methods of measurement that may be used as referencevalues. We will see this in Section 4.3, where we present a brief survey of reference instruments available for selectedexposures used in epidemiology. There is no universally accepted terminology for the different types of ancillary studiesthat are used. Here we describe three types, which we refer to as validation studies, calibration studies, and replicatesstudies.50

4.2 Classification of the types of ancillary study

In a validation study, a measurement of the true value of the variable, X , is obtained, as well as the main error-pronemeasurement X*, for some individuals. The measure of the true value X is often also referred to as the reference measure-ment. A validation study is the cleanest type of ancillary study and in this case, the model relating X* to X can be inferreddirectly from the data.

Suppose X* follows the linear measurement error model in (2). If the true value X cannot be ascertained, then ameasurement that is unbiased at the individual level (ie, that is known to conform to the classical measurement errormodel (1)) may be used in its place. We call this a calibration study.51 The unbiased measurement, which we denote X**,is also often referred to as the reference measurement, and it comes with the extra requirement that its random errors areindependent of the errors of the main error-prone measurement X*.

A calibration study, as described above, can provide the data for the method of measurement error adjustment knownas RC that we will present in Section 6. A single measure using the reference instrument is sufficient to enable use ofRC. However, to identify all the parameters of the measurement error models for X* and X**, the reference measurementwith classical error must be repeated within individuals so as to assess the magnitude of its random error. In particular,this enables the correlation coefficient between the main error-prone measurement X* and the true value X to be esti-mated. A key assumption in this is that the errors in the repeated measurements obtained using the reference instrumentare independent. The repeated measures should be obtained at a sufficiently distant time to ensure such independence,though not so distant that the true underlying value has changed for an individual.

When working with self-reported data, it sometimes occurs that X* comes from a short questionnaire that is inex-pensive to collect and the desire is to validate it against another more intensive self-report procedure that is used as thereference measurement. Unfortunately, this reference measurement may also be biased (although less so), and it is oftenfound that it has errors that are correlated with those using the short questionnaire. This can be considered a type of cali-bration study, but one with an imperfect reference measurement. In this case, the calibration study will provide estimatesof the parameters of the measurement error or misclassification model that are somewhat biased.

A special case, commonly occurring in epidemiology, is where it is assumed that the main error-prone measurementX* has classical measurement error. Then the parameters of the measurement error model may be estimated from repeatedapplications of the main error-prone measurement method within individuals. No measurements of the true value ofthe variable are required. We refer to an ancillary study of this type as a replicates study; it is also sometimes known as areproducibility study or a reliability study. Under the assumption that the errors in the repeated measurements of X* areindependent, the data from a replicates study are sufficient to estimate the parameters of the classical measurement errormodel. Carroll et al6(sec.1.7) describe how to use data from such a study to check whether the assumptions of the classicalmodel really hold. It is sometimes found that the classical model holds only after a transformation (eg, logarithmic) ofthe variable.

Ancillary studies of the three types described above are best nested within the main study. For example, a subgroupof participants in a cohort study may be asked to provide not only the main error-prone measurement of exposurebut also the additional measurement(s), these being the true measurement X (validation study), the reference mea-surement (calibration study), or the repeated measure (replicates study). In this case, the study is called an internalstudy. It may be usually desirable that the subgroup of participants are, as far as possible, a random sample (sim-ple or stratified) of those in the main study. In settings where E(X | X*, Z) is being estimated via a regression, thensampling schemes stratified on variables in this regression could achieve better precision of the coefficients in thecalibration equation compared with simple random sampling. For example, in linear regression, oversampling the

Page 17: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

KEOGH et al. 17

extremes and increasing the variance of X*can be more efficient than simple random sampling in terms of decreasingthe variance of the estimated regression parameters; optimal sampling schemes for multivariable regression can also bederived.52

Ancillary studies that are conducted on a group of individuals not participating in the main study are called externalstudies. External studies are less reliable than internal ones for determining the parameters of the measurement errormodel, since the estimation involves an assumption of transportability between the group of participants in the ancillarystudy and the group participating in the main study. Carroll et al6(sec.2.2.5) describe the dangers of transporting a modelderived from an external study. However, in many circumstances, the only information available about the measurementerror comes from an external study, and careful use of such information (accompanied by sensitivity analyses) can addgreatly to the understanding of results (see Part 2, Section 6 of our article).

Estimation of the error variance in a Berkson model is often problematic, since in these applications reference mea-surements are typically difficult to obtain. In some applications, the Berkson error comes from the use of an X* derivedfrom a prediction equation for X , and in that case the residual error variance estimated from the source data that yieldedthe equation can serve as the Berkson error variance estimate. See, for example, Tooze et al.26

In Section 5 we will discuss the desirable size of a validation, calibration, or replicates study. For further reading onthese types of study see Kaaks et al.53

We will see each of the types of study described above in the following survey of exposure measurements in differentareas of epidemiology. Before proceeding to the survey, it should be noted that, when reporting the results of validation,calibration or replicates studies, most investigators limit themselves to presenting correlations between the measurementsfrom their instrument and the reference instrument (sometimes adjusting for the within-person variation in the referencemeasurement). However, they usually do not use the information from their study to determine the measurement errormodel and its parameters. As a result the information required for adjusting estimates in the main study for measurementerror or misclassification is not reported or used, and the study is used simply to report that the study instrument hasbeen “validated”!5 Investigators should be encouraged to use ancillary study data to better interpret the results of theirmain study.

This section has implicitly focused on the situation in which error is nondifferential. However, similar principles applywhen there is differential measurement error. In that case, parameters of the measurements error model depend on theoutcome Y , and data in the ancillary study should be obtained in such a way that all relevant parameters can be estimated.In particular, the ancillary study requires information on the outcome. This is discussed further in Part 2, Section 2 wheremeasurement error correction methods that address differential error are outlined.

4.3 Reference instruments available for selected exposures used in epidemiology

4.3.1 Nutrition

Of all the areas of epidemiology, nutritional epidemiology has probably paid the most attention to measurement errorsof exposure.54 Nearly all nutritional epidemiological studies rely on self-reported dietary intakes as their main mea-sure of exposure. However, these are known to be subject to considerable error, especially if exposure is defined as theusual (or average) intake over a long period, the measure that is thought to be of most relevance to the epidemiologyof chronic diseases. There is no known way of getting an exact value of this measure, so true validation studies do notexist.

For a few dietary components (energy, protein, potassium, and sodium) unbiased measurements of short-term intakeexist—they are called recovery biomarkers—and can be used as the reference measurements in calibration studies.53

For all other dietary components—foods (eg, vegetables, meat) and other nutrients (eg, fat, fiber)—the usual prac-tice is to rely on a second more accurate self-report method as the reference measurement (calibration study withan imperfect reference measurement). When the main measurement X* is a food frequency questionnaire (FFQ)—arelatively short questionnaire that asks the individual to report on average intake over the past several months (upto 12 months)—24-hour recalls or multiple-day food records in which the participant reports on intakes over a shortperiod in the immediate past are used as the reference. This is less than ideal, since these more accurate methodsare nevertheless somewhat biased and also have errors that are correlated with the errors in the FFQ report. How-ever, they are the best method currently available.55 Prentice and others, using data from a unique large feeding

Page 18: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

18 KEOGH et al.

study,56 are currently engaged in expanding the list of dietary components for which unbiased measurements areavailable.

Another type of reference instrument used is a (nonrecovery) biomarker that is related to the intake of the nutrient orfood consumed (eg, serum cholesterol for saturated fat intake). Such biomarkers are usually subject to a high degree ofmetabolic regulation that varies across individuals and consequently do not provide an unbiased measure of intake, andare not, by themselves, helpful in determining the measurement error model, although they have been used alongsideother methods, to gain understanding of measurement error.57,58

The measurement error model for self-reported dietary intakes has been shown not to conform to the classical model,17

so replicates studies are not a true option. However, many investigators, in the absence of anything better, have adoptedthe assumption that 24-hour recalls provide unbiased measurements and have based measurement error adjustment onstudies of repeated measurements from this instrument (eg, Beaton et al59).

4.3.2 Physical activity

As in nutritional epidemiology, in large studies physical activity has mostly been assessed by self-reports using question-naires, of which there are many variants. A large number of smaller studies have been conducted to “validate” thesequestionnaires (see the National Cancer Institute (NCI) website https://epi.grants.cancer.gov/paq/validation.html) anda few studies have used a measurement error model framework to adjust estimated associations of physical activity withhealth outcomes, for example Spiegelman et al,60 Ferrari et al,3 Nusser et al,61 Tooze et al,26 Neuhouser et al,62 Lim et al,63

Matthews et al,64 and Shaw et al.65 The reference instruments that have been used generally fall into three main cate-gories: doubly labeled water, accelerometers, and physical activity diaries. Doubly labeled water is a technique used toobtain an unbiased measure of total energy expenditure (TEE) and it is useful for determining the measurement errormodel for TEE measured by a questionnaire through a calibration study.66

Since many physical activity questionnaires and recalls are designed to measure physical activity level (PAL) whichis defined as the ratio of TEE to basal energy expenditure (BEE), then for determining a measurement error model forPAL, a reference measure for BEE is also needed, and may be provided by direct or indirect calorimetry. BEE has alsobeen estimated by using a prediction equation; however, this measure of BEE exhibits Berkson error (Equation (3)). Inthis case, it may be necessary to have a calorimetry measure of BEE on at least a subset of participants to form a suitablereference measurement for PAL.26

Unbiased reference measures of other physical activity measurements that can be derived from questionnaires, suchas hours of moderate or vigorous activity, are currently lacking. Accelerometers do provide information on such measure-ments and, although not completely unbiased, they may be used as the reference in a calibration study with an imperfectreference measurement. Physical activity diaries are more accurate than questionnaires,3 but still rely on self-report. Theymay therefore be regarded in a similar manner to 24-hour recalls or multiple-day records of food intake, not ideal as refer-ences but usable in circumstances where other references, such as accelerometers or doubly labeled water, are infeasibleor unsuitable (eg, the activity measure of interest is something that cannot be measured by either of them, such as theamount of time spent in anaerobic exercise).

4.3.3 Smoking

Since smoking is a causal factor in a range of chronic diseases, it is often collected as a potential confounding variable inchronic disease epidemiology studies. The usual mode of collection is through self-report questionnaires. The most com-mon method of “validation” of self-reported smoking status is biochemical. Three different metabolites may be measured:thiocyanate in the blood, urine or saliva; cotinine in the saliva, blood, urine or hair; and exhaled carbon monoxide.67

Although these measurements are made on a continuous scale, they have been used mostly as binary indicators (smokeror nonsmoker) using a predetermined cut-off point (that has varied among investigators). Thus, these studies report mis-classification rates (sensitivity and specificity), rather than measurement error model parameters or correlations. Whencalculating these rates, the investigators have usually assumed that the biochemical measurement yields the true smokingstatus, and, thus, that the study is a validation study. Biochemical validation has been used most frequently in smoking ces-sation trials, where it has become the standard method of assessing the outcome.68 Its use in observational studies is lesswidespread, but nevertheless many validation studies have been conducted in this setting. For a review of cotinine-basedvalidation studies, see Rebagliato.69

Page 19: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

KEOGH et al. 19

4.3.4 Air pollution

Research studies into links between air pollution exposure and health outcomes have often taken the form of longitudi-nal studies, where time series of air pollution levels in different geographical areas are compared with levels of a disease,such as asthma, at the same location and time (with a possible lag effect). Exposures are often assessed using a mathemat-ical model that is applied to serial measurements of the concentration of certain particles in the air at fixed locations andto other information such as temperature, wind strength and direction, and topology, so as to provide an estimate of theambient pollution at a given time and location. Zeger et al4 discuss the type of measurement error inherent in such esti-mated exposures when used as measures of exposure at the individual level, and conclude that it is a mixture of Berksonand classical errors (see Part 2, Section 5.1 of our article). The accepted gold-standard for measuring personal exposure isthrough a personal monitoring device, which is assumed to provide unbiased measures of true exposure. For a review, seeKoehler and Peters.70 For example, exposure at the individual level was recorded in the PTEAM study on a personal mon-itor for measuring inhalable (PM10) particles or fine (PM2.5) particles71; this can then be used in a calibration study. Inthe Augsburger Environmental Study repeated measures from a personal monitor measuring ultrafine particle concen-trations were available from an external sample, giving an external replicates study.72 For the Augsburger EnvironmentalStudy, methods for handling the mixture of classical and Berkson error were developed in Deffner et al73 and applied tothe data. A valuable resource for investigation of measurement error modeling methods are the data from the Nine CityValidation Study74 on personal daily exposure to PM2.5 particles compared to PM2.5 of ambient origin based on the near-est EPA monitor and spatio-temporal smoothed exposure estimates. These data may be requested at the website https://www.hsph.harvard.edu/pm2-5-validation-dataset/.

4.3.5 Other exposures

There is, of course, no end of other exposures that are relevant to questions of public health and it can be assumed thatmany of them cannot be measured without substantial error. In some cases, such as blood pressure measurement, a goldstandard measurement (intra-arterial blood pressure) exists and validation or calibration studies of more approximatemeasurement procedures (eg, sphygmomanometer) can be conducted to determine the measurement error model. Inother cases, it is thought that the measurement, if not exact, is at least unbiased (eg, serum cholesterol) and replicatesstudies are sufficient to estimate the statistical magnitude of the error and make corrections for its impact. However, oftenneither of these situations exists and the best that can be done is to compare the main measurement method used witha method that although imperfect is thought to be better (calibration study with an imperfect reference measurement).75

For example, body mass index (BMI), as a measure of body fat, may be compared with percent body fat measured bybioelectrical impedance analysis.76 In Part 2, Section 6 of our article, we discuss these many cases where it is known thatthe measured exposure is subject to considerable measurement error, but the measurement error model is in some senseunknown.

5 DESIGN OF STUDIES WHERE ONE OR MORE OF THE MAJORCOVARIATES IS MEASURED WITH ERROR

In view of the impacts that measurement error or misclassification have on study results, it is advisable to take account ofthe error at the design stage. To do that, aside from defining the main aim of the study and its target estimate, one needsinformation on the measurement error model and its parameters. Only then can one make the appropriate adjustment tothe design in the form of a change in sample size, or more fundamentally a change in the measurement of the error-pronevariable in question. In addition, as we already showed in Section 4, the information about the measurement error modelplays a central role in making adjustments for measurement error in the analysis of the main study. A number of authorshave discussed design issues in the presence of measurement error.12,77-79

In Section 4, we described ancillary studies that are conducted to obtain information about the measurement errormodel. We will now deal with determining how large such studies should be and then proceed to methods for cal-culating statistical power in the presence of measurement error as an aid to deciding on the design of the mainstudy.

Page 20: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

20 KEOGH et al.

Our focus in this section is on the size of the ancillary study. An alternative scenario is that a total cost is assigned andthat sample sizes for the main study and ancillary study are derived according to requirements for meeting a specifiedobjective. More work is needed on sample size calculations for this situation.

5.1 Size of ancillary studies

Relatively little attention has been paid to the appropriate size of an ancillary substudy, and we provide here a guideline.A first principle is that the size of the ancillary study is decided in relation to its contribution to the main study's goal.Therefore, we must specify, among other things, what is the target of interest in the main study, and what is the desiredstatistical power for testing the exposure-outcome association.

As in Section 3.1.1, we limit ourselves to the situation of a single exposure X that is measured by X* with nondifferentiallinear measurement error, and focus on the aim of estimating the slope in a simple linear regression model of a healthoutcome Y on X . To recap, we consider the model:

E(Y |X) = 𝛽0 + 𝛽X X . (21)

If we were to use X* in place of X , then we would obtain instead a different regression model:

E(Y |X∗) = 𝛽∗0 + 𝛽∗X X∗. (22)

In the main study, Y and X* are observed in all participants. In the ancillary substudy, a “reference” measurement,R , equal to X or an unbiased measurement of X , is observed in addition to X*. In the terminology of Section 4 we aretherefore in the situation of a validation or calibration study. As discussed in Section 3.1.1, the linear model in (22) doesnot always hold, but is often a good approximation.

Recall from Section 3.1 that, when X* is measured with nondifferential linear error, the relation 𝛽∗X = 𝜆𝛽X holds, where𝜆 is the attenuation coefficient. Therefore, a simple “adjusted estimate” of 𝛽X is obtained by dividing the estimate of 𝛽∗X(that comes from the main study) by an estimate of 𝜆 that is obtained from the ancillary study (see Section 6). In simplecases, 𝜆 is estimated by the slope of the linear regression of R on X*. The variance of the adjusted estimate of 𝛽X can thenbe expressed by the approximation80

var(𝛽X ) =var(𝛽∗X )

𝜆2 +𝛽∗2

X var(𝜆)𝜆4 . (23)

The second term on the right-hand side of (23) represents the extra uncertainty introduced into the estimate of 𝛽X bythe uncertainty in the value of 𝜆. We may choose the size of the validation/calibration study to minimize the impact ofthis extra uncertainty, specifically so that the second term of the right-hand side of (23) will be a small fraction, f , of thefirst term. For example, if the size of the main study will provide 50% power for a test of the null hypothesis that 𝛽∗X = 0 atthe 5% significance level when the true value is 𝛽∗X , that is, that approximately var(𝛽∗X ) = 𝛽∗2

X ∕4, we obtain var(𝜆) = f𝜆2∕4.If the investigator is planning to test the association with exposure at a two-sided level 𝛼 (maybe different from 0.05) withpower 1−𝜔 (maybe different from 0.5), then the factor 4 may be replaced by [Φ−1(1− 𝛼/2)+Φ−1(1−𝜔)]2, where Φ isthe standard normal cumulative distribution function. In the validation study case where the reference measurement Requals X , we can apply the formula for the variance of a regression slope and, simplifying, we obtain the formula for thesample size of the validation study nv, as:

nv =

{Φ−1

(1 − 𝛼

2

)+ Φ−1(1 − 𝜔)

}2(1 − 𝜌2

XX∗ )

f𝜌2XX∗

, (24)

where 𝜌2𝑋𝑋∗ is the squared correlation coefficient between X and X*.

For example, if f = 0.1, 𝛼 = 0.05, 𝜔 = 0.50 and 𝜌𝑋𝑋∗ = 0.4, then a sample size nv = 210 will ensure that the variance ofthe adjusted estimate of 𝛽X will not increase by more than about 10% (f = 0.1) as a result of the uncertainty in estimating𝜆. Clearly, the larger the degree of measurement error, the larger the validation/calibration study that will be needed. If

Page 21: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

KEOGH et al. 21

instead of a study with 50% power, the investigator plans a study with 90% power (𝜔 = 0.10), then nv = 550. Thus, thehigher the power desired in the main study, the larger the required validation/calibration study.

Note that the quantity 𝜌XX∗ will not be known precisely before the validation study is conducted—indeed it is one ofthe quantities that we hope to estimate from the validation study data. Consequently, a plausible value will need to bespecified in order to use formula (24). This is not unlike needing to specify an exposure's hitherto unknown effect on theoutcome in order to calculate the sample size of a main study.

If reference measurement R is an unbiased, but not exact, measure of exposure with random errors, as in a calibrationstudy, then a different formula based on the approximation var(𝜆) = f𝜆2∕[Φ−1(1 − 𝛼∕2) + Φ−1(1 − 𝜔)]2, may be derived fornv. Kaaks et al53 supply details for the setting where there is a log-linear relation between the incidence rate of disease anderror-prone exposure of interest, albeit with very different notation. For problems with two or more exposures measuredwith error, although the same principles may be used, the sample size formulas have not been derived.

5.2 Calculating statistical power and sample size in the presence of measurement error

5.2.1 Continuous X and X*

In what follows, we assume that the investigator is planning the sample size by specifying the regression coefficient asso-ciated with X . In this situation, adjustment for the measurement error needs to be made, as specified below. If, however,the investigator prefers to specify the regression coefficient associated with X* (thus taking into account within this spec-ification the measurement error) then no further adjustment is required and the material that follows in this subsectionis irrelevant.

For simplicity, we assume, as in Section 2.1, that the study is designed in order to elucidate the association between acontinuous exposure X and health outcome Y , and that X is univariate. We assume that in the main study we observe X*,and not X , where X* measures X with nondifferential linear measurement error. There may or may not be other exactlymeasured covariates Z that need to be in the model relating Y to X .

In Section 3.1.1, just after Equation (7), we noted that when Y is continuous and is linked to exposure X through a lin-ear regression, then when there are no covariates Z, the measurement error effectively lowers the sample size by 𝜌2

XX∗ , thesquared correlation between X and X*. If, however, there are covariates Z, the effect of measurement error effectively low-ers the sample size by the factor 𝜌2

XX∗∣Z the squared partial correlation between X and X* given Z. Thus, to achieve compara-ble power to what would happen if X were observed, using X* requires increasing the sample size, sometimes dramatically.

Indeed, the sample size needs to be increased by the factor 1∕𝜌2XX∗ , or, where covariates Z are included, by the factor

1∕𝜌2XX∗∣Z. For example, if the correlation between X and X* is 0.40, measurement error means that the required sample

size will be 6.25 = 1/(0.40)2 times larger than it would be if X were observable.Consider now logistic and Cox regression. As described in Section 3.1.4, the results given above for linear regres-

sion serve as good approximations for logistic, Poisson and Cox regression, under the proviso that 𝛽X remains “smallto moderate”—see also Devine and Smith,81 McKeown-Eyssen and Tibshirani,82 and White et al.83 In Tosteson et al,84

power and sample size issues are described for logistic regression but without the “small to moderate” assumption. Theseauthors show in their Figure 2 an example wherein the sample size actually needs to be increased by another 2%-5% forlogistic regression above the increase discussed in the previous paragraph.

This discussion suggests the following strategies for calculating sample size and power in studies where the mainexposure X is measured with error:

1. Use any information available about the distribution of X and its anticipated association with Y to set a sample size,nX , in the ideal event that X could have been observed. Also, for this sample size, nX , compute the resulting powerfunction, say powX (𝛽X ). See Self and Mauritsen85 for such calculations for generalized linear models. Then, when usingX*, inflate the sample size as described above to nX∗ = nX∕𝜌2

XX∗∣Z, and evaluate the power function as powX∗ (𝛽X ) =powX (𝜆𝛽X ), where 𝜆 is the attenuation factor defined in Section 3.1.

2. For a binary outcome and logistic regression, unless using the method in Tosteson et al,84 inflate the sample size by anextra 2%-5%.

The following is an example of an adjustment to the sample size calculation required for a hypothetical cohort studyof the association between the sodium-potassium (Na-K) intake ratio of an individual with all-cause mortality. Previous

Page 22: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

22 KEOGH et al.

information indicates that the variance of the log of the true usual (ie, long-term average) sodium-potassium intake ratioin the USA has a population standard deviation of 0.35. Suppose also that the expected hazard ratio for all-cause mortalityper change of 0.35 in log usual Na-K intake ratio is 1.17, consistent with a study based on a NHANES cohort.86 Then thenumber of deaths, D, required to provide 1−𝜔 power to detect such an effect using a two-sided Wald test at level 𝛼 is

given by D =(

z1− 𝛼

2+ z1−𝜔

)2∕(𝜎2

X𝛽2X ),

87 where 𝛽X is the log hazard ratio for a change of 1 unit in continuous variable X ,

and 𝜎2X is the variance of X . Setting 𝛼 = 0.05, 𝜔 = 0.1, 𝜎2

X = 0.352, and 𝛽X = log(1.17)0.35

= 0.45, we obtain D = 423. However,the dietary report instrument we wish to use in our cohort is a FFQ, which measures dietary intake with error. From theOPEN validation study of dietary report instruments in which unbiased measures of sodium and potassium intakes weremeasured,88 the correlation between true FFQ Na-K intake ratio and the true ratio conditional on age and gender can beestimated as 0.45. Then the study should be designed to observe 423/(0.452) = 2,089 deaths. This approximately 5-foldincrease in the required observed number of deaths could be achieved by increasing sample size, increasing length offollow-up, including a larger proportion of elderly persons in the cohort, or some combination of these.

5.2.2 Binary X and X*

In Section 3.2.1, we described attenuation for a continuous outcome Y when both X and X* are binary. The guidelines inSection 5.2.1 will be useful also in this case. While formulae for power and sample size determination can be obtainedfor the binary X and X* case, to the best of our knowledge there are no publicly available programs to do this. It would beuseful to have such programs.

6 ANALYSIS OF STUDIES WHERE ONE OR MORE OF THE MAJORCOVARIATES IS MEASURED WITH ERROR

In Section 3 we described the impact of measurement error in covariates on study results in the event that no statisticaladjustments are made for the presence of that measurement error. In Sections 4 and 5 we outlined what additional infor-mation is needed to learn about the form and magnitude of measurement error and how to accommodate the need toaddress the impact of measurement error at the design stage of a study. We now describe in this section two of the simplermethods for adjusting the statistical analysis so that the resulting estimates of key parameters will be free (or approxi-mately free) of the bias induced by measurement error—RC and SIMEX. We focus on regression problems with one ormore covariates that are measured with error that conforms to the classical or linear measurement error model. Morecomplex methods for this problem and methods for other types of problem, including misclassification, are described inPart 2 of this article.

It should be noted that, when the measurement error causes attenuation, the estimates from the two methodsdescribed will have variance that is larger than the estimates based on applying the standard analysis as if all variables weremeasured exactly—such estimates are sometimes referred to as naïve estimates and their bias is described in Section 3.1.How much larger the variance of the adjusted estimates will be will depend on the level of attenuation and the size of thevalidation, calibration, or replicates study (see Section 4.1). Thus practitioners are often confronted with a bias-variancetrade-off in their choice of estimation method. In the analyses below, the estimated standard errors of the estimates derivedfrom the correction methods are made to account for the extra uncertainty introduced by the measurement error.

We will illustrate RC with an example from the OPEN study.88 This was a dietary intake calibration study (Section 4.2)using unbiased reference measurements, conducted in 484 adult volunteers aged 40-69 years, resident in Maryland, USAin 1999-2000. Participants reported on their dietary intake using a FFQ, provided two 24-hour urines for measuringsodium and potassium intake, and provided samples for measuring total energy intake through doubly labeled water. Thetarget dietary measures are considered to be average daily potassium intake density and sodium intake density, where thedensity is the ratio of the intake to total energy intake. The issue to be addressed will be the association of these intakes witha person's BMI. The questionnaire responses are considered to have linear measurement error and the urinary data areconsidered to have classical measurement error (since they measure only a single day's intake with unbiased, independentassay error). The errors in the questionnaire and urinary measurements are assumed to be uncorrelated. Furthermore,the errors in both questionnaire and urinary measurements of potassium and sodium intake density are assumed to benondifferential with respect to BMI. The dataset is referenced as “Selected OPEN data.”89

Page 23: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

KEOGH et al. 23

6.1 Regression calibration

RC is one of the most popular methods of adjusting for nondifferential covariate measurement error. The method is intu-itive and relatively easy to implement. For the setting of linear regression of a continuous outcome on covariates, one ormore of which is measured with error, RC yields an unbiased estimate of the target regression coefficient if the calibra-tion model (see below) is exactly known, or the parameters of the calibration model can be estimated consistently. Thereis an extensive literature on this method, with early descriptions including those by Prentice,41 Carroll and Stefanski,90

Armstrong,91 and Rosner et al.80,92,93

6.1.1 Regression calibration with continuous outcome and one covariate measuredwith error

Suppose that our analysis of the relationship between a continuous outcome Y and variable X is based on the linearregression model E(Y | X , Z) = 𝛽0 + 𝛽X X + 𝛽ZZ, where X is scalar and Z may be scalar or vector. Suppose further that X*

measures X with nondifferential linear measurement error. The RC estimator is obtained by performing the regressionof outcome Y with the unobserved X replaced, not by X*, but by the predicted value of X given the observed data X* andZ, namely E(X | X*, Z). By iterated expectation, one has:

E(Y |X∗,Z) = EX∣X∗,Z{E(Y ∣ X∗,Z,X)} = EX∣X∗,Z{E(Y ∣ Z,X)} = EX∣X∗,Z{𝛽0 + 𝛽X X + 𝛽ZZ} = 𝛽0 + 𝛽X E(X|X∗,Z) + 𝛽ZZ.The second equality is due to the crucial assumption that conditioned on X and Z, X* contains no additional informationabout Y , that is, that the error in X* is nondifferential. From this equation we see that by regressing Y on E(X | X*, Z) andZ, we obtain consistent estimates of the coefficients for both X and Z. Note, this holds true regardless of whether X iscontinuous, binary, or ordinal so long as the model for E(X | X*, Z) is correct.

The expression for E(X | X*, Z) is called the calibration equation. If the calibration equation is linear, namely,E(X|X∗,Z) = 𝜆0 + 𝜆X∗X∗ + 𝜆ZZ, then parameter 𝜆X∗ is identical to the attenuation factor discussed in Section 3. In thecase of the classical error model (1) of Section 2, when X and measurement error U are normally distributed, one hasthat E(X | X*) = 𝜆X* + (1− 𝜆)E(X*), where 𝜆 is the attenuation factor discussed in Section 3.1.1. This formulation of thecalibration equation helps illustrate the effect of measurement error. When the error variance is small relative to the vari-ance of X , 𝜆 is close to 1 and the estimated E(X | X*) makes only a small adjustment to X*; when the error variance is verylarge, E(X | X*) will be close to E(X*) = E(X).

As discussed in Section 4, estimation of the parameters in the model for E(X | X*, Z) requires an ancillary study. Ina validation study in which the true measures of X are observed in some individuals (either internal or external to themain study), the calibration equation can be estimated directly through a regression of X on X* and Z. In a calibrationstudy, such as that in the OPEN study, a reference measurement X** can be obtained on a subset of individuals, and thisis thought to follow the classical measurement error model X** = X +V , where V is random error with mean zero. Inthis case E(X**| X*, Z) = E(X +V | X*, Z) = E(X | X*, Z). Thus, one can obtain the predicted value for E(X | X*, Z) simply byregressing the reference measurement X** value from a similar time period on the observed X* and Z. In a replicates studyin which repeat measurements X∗

1 and X∗2 are available on at least a subset of individuals, and where both are assumed

to be subject to the classical measurement error model (1), the predicted value for E(X | X*, Z) can be obtained by regress-ing X∗

2 on X∗1 and Z. Carroll et al6(sec.4.4.2) also provides a formula for the best linear approximation of E(X | X*, Z) using

replicate data.A few authors have considered ways to improve upon the RC estimator, that is to identify methods that give more pre-

cise estimates, in some circumstances by making better use of the data. Efficient maximum likelihood methods for thelinear and logistic outcome models with a replicates study were described by Bartlett et al.94 Spiegelman et al95 describedefficient approaches for the logistic outcome model with either validation and calibration substudies; this approach con-siders a weighted estimator that combines separate estimates of the target parameter from the main study and from thecalibration/validation substudy.

It is worth noting that RC can also be used for a misclassified binary covariate when there is a validation studywhich can be used to estimate E(X | X*, Z). When X is binary the expectation needed for RC can be estimated froma replicates study with three or more repeated measures of X* in a subset of individuals. Two replicates can be suf-ficient in some settings, but this generally relies on either simplifying assumptions about the error model (such asone of sensitivity or specificity are 100% or they are equal) or the two replicates are observed simultaneously withthe binary outcome Y . In the latter case, the information regarding the relationship between X and Y can be used

Page 24: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

24 KEOGH et al.

to identify the necessary parameters, but this method has not been observed to perform well when this relationshipis weak.96

Sometimes, to obtain more precise estimation of X , one may wish to include further covariates Z# in the calibrationequation in addition to the covariates Z originally planned to be in the model for Y .22 The Z# variables that may be usedfor this purpose are those that are predictive of X , but independent of Y given X and Z, and therefore are not included inthe outcome model.

The RC method is closely connected to even simpler methods for measurement error correction, which arebased on the relation 𝛽X = 𝛽∗X∕𝜆 shown in Section 3.1.2, where 𝜆 is the attenuation factor and also the slope inthe regression of X on X* and Z in the calibration equation used in RC.80,91,97 An estimate of 𝛽X can be obtainedfrom the above relation using an estimate of 𝛽∗X from the naïve analysis unadjusted for measurement error togetherwith an estimate of 𝜆. An estimate of 𝜆 may be available from an external study or it can be estimated from anancillary study as the slope in the calibration regression model. In the latter case, this simpler approach is identi-cal to RC as described above, which instead makes use of the predicted values of E(X | X*, Z) from the calibrationequation.

When the outcome model is a linear regression, the attenuation factor 𝜆 can also be estimated using the methodof moments. For example, in the situation of a single covariate measured with classical error and no other covariates,𝜆 = var(X)/var(X*). The variance of the error-prone measurement X* can be estimated directly from the main study. Ifa validation study is available, var(X) can be estimated directly within it. In a replicates study, the covariance betweenthe replicate measurements provides an estimate of var(X), using the assumption of classical error and uncorrelatederror in the replicate measurements. In this simple setting of classical measurement error the method of moments esti-mator is equivalent to the RC estimator, because the parameters of the regression model for E(X | X*) used in RC canbe defined in terms of the first and second moments. When X* follows the linear measurement error model (2) theexpression for 𝜆 is 𝜆 = 𝛼X var(X)

𝛼2X var(X)+var(U)

, as given in (7). The individual components of this expression can be estimatedusing the method of moments from a validation study. However, in the case of a calibration study estimating var(X)and var(U) requires two or more measures of the unbiased reference measurement, whereas RC can be performed usingonly one such measurement.17,58 The method of moments approach is conceptually straightforward and, like RC, itextends to situations with covariates Z or with multiple error-prone measurements (Section 6.1.3). As in RC, use ofapproximations or bootstrapping is still needed to obtain standard errors of the correct estimates (see Section 6.1.2).Unlike RC, the method of moments does not extend to nonlinear models, though it does extend to the situation ofdifferential error (see Part 2). The method of moments is covered in several of the key texts on measurement errormethods.6,7,9,97

We now illustrate RC using an example from the OPEN study, introduced earlier in Section 5, in which we examinethe association between BMI and usual (average daily) log potassium density intake. Our dependent variable (Y ) is BMI,X is usual log potassium density intake, X* is the log FFQ potassium density reported which is considered to have linearmeasurement error, and we control for age and sex (Z). We first perform an analysis unadjusted for the measurementerror in FFQ-reported potassium density intake, and regress BMI on log FFQ potassium density intake, age and sex forthe 483 participants who have observations for the variables. The result is shown in Table 2(1), with the coefficient for logpotassium density estimated as −1.69 with standard error 0.93 and a z-value of −1.81. This indicates that there may be anegative association between potassium density intake and BMI, with a decrease of approximately 1 BMI unit for everydoubling of potassium density intake, not a very large effect.

To perform RC, we develop a prediction equation relating true log potassium density intake to FFQ log potassiumdensity, age, and sex. We assume here that the appropriate form of the equation is linear. The equation is estimatedby regressing an unbiased measure of log potassium density intake, the average of the two log urinary potassium val-ues minus the log doubly labeled water value (the reference instrument, X**), on FFQ log potassium density (X*), age,and sex (Z). In most applications the urinary measures, which create a much higher participant burden, and the dou-bly labeled water exam, which is expensive, are performed in a smaller random subset of participants. Although in ourdataset we have these measures in nearly all the participants of this study, we have here sampled only 250 participantsto develop the prediction equation. The result is shown in Table 2(2), and the prediction equation for log potassium den-sity intake is:−0.48+ 0.45× log(FFQ potassium density)+ 0.046× sex+ 0.0059× age. The predicted values obtained usingthis equation are now used for the log potassium density variable in the regression of BMI on log potassium density, age,and sex.

Page 25: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

KEOGH et al. 25

T A B L E 2 Results from analysisof data from 483 participants in theOPEN study: single covariatemeasured with error

(1) Analyses of the association of log potassium density intakewith BMI, using an unadjusted analysis and regression calibration

Unadjusted analysis Regression calibration

Variable Est. SE z-Value Est. SE z-Value

log FFQ potassium density −1.69 0.93 −1.81 −3.76 2.43a −1.55

Sex (F vs M) −0.38 0.49 −0.77 −0.20 0.58a −0.36

Age (years) 0.039 0.029 1.32 0.061 0.034a 1.81

(2) Prediction model used by regression calibration,based on a random subsample of 250 participants

Variable Est. SE z-Value

Intercept −0.48 0.16 −2.96

log FFQ potassium density 0.45 0.089 5.02

Sex (F vs M) 0.046 0.044 1.02

Age (years) 0.0059 0.0027 2.23

Note: Estimated coefficients (Est.), standard errors (SE) and z-values.aFrom 5000 bootstrap samples.

The result is shown in Table 2(1), with the estimated coefficient for log potassium now equal to −3.76. Theadjusted estimate of the coefficient indicates a stronger association with BMI than the unadjusted estimate, with a30% increase in potassium density now associated with a decrease of approximately 1.0 (3.76× log1.33) BMI unit. Itis worth noting that the adjusted coefficient for log potassium density, −3.76, is equal to the unadjusted estimate,−1.69, divided by the coefficient for log FFQ potassium density in the prediction equation, 0.45; this follows themultiplicative relationship between the unadjusted estimate of the coefficient and its true value that was describedin Section 3.1.1.

There are many examples of RC, particularly in the nutritional epidemiology literature. A situation that is challengingfor RC arises when Z is a strong predictor of both X and the outcome Y , resulting in E(X | X*, Z) and Z being highlycorrelated. Detailed discussion of this issue is provided by Prentice et al98 and Freedman et al.99

6.1.2 Standard error estimation for regression calibration

Due to the extra uncertainty in the parameters estimated in the calibration model, one cannot use the usual model stan-dard errors from the outcome regression model when performing RC, as these will be too small, resulting in confidenceintervals that are too narrow. Instead, standard errors for the estimated coefficients in the outcome model may be obtainedeither from a bootstrap variance estimator100 or a sandwich estimator obtained by stacking the calibration and outcomemodel estimating equations6(appendixB3),99. Due to its ease of implementation, the bootstrap is commonly used in prac-tice. Commonly, a nonparametric bootstrap is used, and is applied separately to the ancillary study sample and the mainstudy sample; that is, for internal calibration, validation, or replicates studies the bootstrap is stratified by membershipto the ancillary study. In some studies, notably when the ancillary study is large, the uncertainty in the estimation of thecalibration equation parameters is small relative to that in the estimation of the outcome model parameters, and in thiscase, ignoring uncertainty in the estimation of the calibration equation parameters may not have a large impact.101 How-ever, we demonstrated in Section 5.1 that uncertainty in the estimation of the calibration equation parameters can havea large impact. It is good practice to make corrections to the model SEs and doing so is straightforward in practice usingbootstrapping.

For the example given in Section 6.1.1, we bootstrapped the entire RC procedure, starting with the estimation of theprediction equation for log potassium density and then incorporated the re-estimated predicted values for E(X | X*, Z)into the regression model for BMI. The bootstrap standard errors of the estimated coefficients in that model are shown inTable 2(1). Note that, just as the adjusted coefficient for log potassium density is much larger than the unadjusted value

Page 26: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

26 KEOGH et al.

(−3.76 vs −1.69), so is its standard error (2.43 vs 0.93). In fact, the relative increase in the standard error is larger than therelative increase in the coefficient, so that the adjusted Wald z-value (−1.55) is smaller than the unadjusted value (−1.81).Actually, as seen in Table 1, for a single covariate measured with nondifferential error, the test based on the unadjustedWald z-value is valid, so there is no need to test the null hypothesis of no association using the adjusted analysis. Moreover,a test based on the adjusted Wald z-value would be less powerful—however, this is because such a test is based on anincorrect assumption of normality for the z-value. A correct test based on the adjusted analysis can be performed usingpercentile-based bootstrap confidence intervals. Frost and Thompson37 described alternative methods for constructingconfidence intervals with correct coverage in this setting, using Fieller's Theorem.

6.1.3 Regression calibration when more than one covariate is measured with error

It is not uncommon that multiple error-prone covariates (X1,… , Xp) are of interest (such as smoking, alcohol intake, andphysical activity as predictors of blood pressure). As seen in Sections 3.1.3 and 3.2.3, coefficients estimated in a regressionthat ignores the error in multiple covariates can be biased in either direction. RC can be applied to provide consistentestimates of the coefficients in the outcome model, so long as appropriate data are available to fit the series of calibrationequations E(Xi|X*, Z), i = 1, … , p. Note that in this case, there is one equation for each covariate that is measured witherror, and X* typically includes all the error-prone covariates. When there is an internal validation or calibration studywith simultaneous observation of the vector of error-prone covariates X*, exact covariates Z and exact or unbiased refer-ence measurements for vector X , then a multivariate regression model can be fit for E(X |X*, Z). Rosner et al92 developthis approach for the setting of logistic regression with a rare disease outcome and one or more continuous error-pronecovariates. When classical measurement error occurs, it is sufficient that independent replicates of a vector X* are mea-sured in a replicates study (see Section 4.2). Carroll et al6(sec.4.4.2) present a RC-based approach for this case, allowingvarying number of replicates across individuals. A key assumption for this approach is that the replicate observationsare independent given the true X . Methods to estimate interactions between two error-prone variables have also beendeveloped.102,103

We illustrate RC with more than one error-prone covariate using an extension of the example in Section 6.1.1.We consider the regression of BMI on log potassium density and log sodium density intakes, while controlling forage and sex. As noted above, the error-prone measurements of potassium density and sodium density are presumedto have nondifferential error with respect to BMI. Hence RC was appropriate for application to this example. InPart 2 we consider instead using absolute average daily sodium intake as the main covariate; this absolute measureis subject to differential error, making RC an unsuitable correction method. The estimated regression coefficientsof the unadjusted model, using FFQ reported intakes, are presented in left-hand part of Table 3(1). The estimatedcoefficients for log potassium density and log sodium density are −1.93 and 1.51, with z-values of −2.03 and 1.26,respectively. Unlike in the case of a single error-prone covariate model (Table 2(1)) hypothesis tests based on thesez-values are invalid as highlighted in Table 1. Table 3(2) shows the results of the calibration models for log potas-sium density and log sodium density, and the right-hand part of Table 3(1) shows the adjusted estimated coeffi-cients for these variables. As in Section 6.1.2, the standard errors are obtained by bootstrapping. One can see thatboth estimated coefficients are larger in magnitude than their unadjusted versions, the log potassium density coef-ficient by a factor of 1.6 and the log sodium density coefficient by a factor of 2.4. However, neither coefficientreaches the conventional statistical significance level of 0.05. It is interesting that the similar magnitude but oppo-site signs of the log potassium density and log sodium density coefficients do suggest that the association with BMIis through the sodium/potassium ratio, a dietary intake measure that has been found to be related to cardiovasculardisease.104

6.1.4 Regression calibration in nonlinear models

When the outcome model is nonlinear, RC provides estimates that are only approximately unbiased; however, it hasbeen observed to work remarkably well in generalized linear models logistic, and Poisson regression.6(sec.4.8),80,105 Asdiscussed in Section 3.1.4, RC usually works well if the magnitude of 𝛽X remains “small to moderate” or the measurementerror variance is “small”.6(sec.4.8) For Cox regression, Prentice41 demonstrated that RC gives a reasonable approximationin cases of rare disease and either a log hazard ratio of moderate size or small measurement error variance. Xie et al40

Page 27: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

KEOGH et al. 27

T A B L E 3 Results fromanalysis of data from 483participants in the OPEN study:two covariates measured with error

(1) Analyses of the association of log potassium density intake and log sodium densityintake with BMI, using an unadjusted analysis and regression calibration

Unadjusted analysis Regression calibration

Variable Est. SE z-Value Est. SE z-Value

log FFQ potassium density −1.93 0.95 −2.03 −3.17 2.34a −1.35

log FFQ sodium density 1.51 1.20 1.26 3.56 3.82a 0.93

Sex (F vs M) −0.33 0.44 −0.67 −0.09 0.60a −0.16

Age (years) 0.038 0.029 1.30 0.044 0.038a 1.14

(2) Prediction models used by regression calibration, based on a random subsampleof 250 participants

Variable Est. SE z-Value

Prediction model for log potassium density

Intercept −0.48 0.17 −2.84

log FFQ potassium density 0.45 0.091 4.94

log FFQ sodium density −0.006 0.107 −0.06

Sex (F vs M) 0.046 0.046 1.00

Age (years) 0.0059 0.0027 2.26

Prediction model for log sodium density

Intercept 0.20 0.16 1.27

log FFQ potassium density −0.14 0.09 −1.66

log FFQ sodium density 0.42 0.10 4.11

Sex (F vs M) −0.025 0.043 −0.59

Age (years) 0.0038 0.0025 1.49

Note: Estimated coefficients (Est.), standard errors (SE) and z-values.aFrom 5000 bootstrap samples.

demonstrated that the performance of RC in Cox regression for larger log hazard ratios can be improved by re-estimatingE(X | X*, Z) on each risk set. This approach is called “risk set regression calibration” and works well when recalibrationis done periodically (say at every fifth percentile) rather than at each failure time.106 Liao et al107 extended the risk setapproach for time-dependent covariates. For highly nonlinear models, RC can be problematic, and Carroll et al6(chap.4)

give methods to improve its performance.

6.2 Simulation extrapolation (SIMEX)

Another method which, like RC, enjoys considerable use in practice, is SIMEX. SIMEX is a very general method formeasurement error correction in estimating complex regression models in the presence of classical measurement error(see model (1)) in the covariates. It was proposed by Cook and Stefanski108 and enhanced by Carroll et al.109 The methodhas been extended to a correction for misclassification (MC-SIMEX) by Küchenhoff et al110,111—see Part 2, Section 2.5for more details. The basic idea of SIMEX is to add more error to the error prone covariate X*, to see the impact thishas on the parameter estimates from the outcome regression model, and to then extrapolate back to the situation withno measurement error. This is equivalent to estimating the relationship between the measurement error variance var(U)and the parameter estimates in the regression ignoring the measurement error, and to extrapolate back to the situationin which the error variance is 0. As a simple example, for single covariate regression (see Section 3.1.1) the theoreticalrelationship between the biased regression slope 𝛽∗X , based on the regression of Y on X*, is displayed in Figure 5 as afunction of var(U), 𝛽∗X (var(U)). In practice, this function is estimated from simulated datasets obtained by adding further

Page 28: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

28 KEOGH et al.

F I G U R E 5 SIMEX: Relationship between the measurement errorvariance var(U) and the regression coefficient 𝛽∗X . 𝛽X is the regressioncoefficient in a regression of Y on X and 𝛽∗X is the regression coefficient in aregression of Y on X*. X* is assumed to follow the classical error modelX* = X +U, and 𝛽∗X = var(X)

var(X)+var(U)𝛽X . Here, var(X) = 1 and 𝛽X = 1

F I G U R E 6 SIMEX estimation for the association between individualheart rate as outcome variable and log-transformed individual particlenumber concentration (a measure of air pollution exposure) measured innumber per cm3. The SIMEX estimator is assessed assuming ameasurement error variance of 0.03, which is determined throughcomparison measurements, a quadratic extrapolation function, number ofsimulations B = 100 and s = (1, 1.5, 2, 2.5, 3). The analysis is based onlongitudinal data from an observational study described in Peters et al.72

The model accounts for temperature, relative humidity, time trend, and timeof the day. The solid curve is obtained from the fit of the extrapolation modelto the pseudo-datasets, and the dotted line represents the extrapolated part

measurement error to the error-prone covariate and estimating the regression coefficient 𝛽X in each dataset, and thenusing a model to relate the estimated regression coefficients to var(U). If the estimator for this functional relationshipbetween the error variance and 𝛽∗X is consistent, then in the case of an error-free covariate, 𝛽∗X (0) = 𝛽X . So the function𝛽∗X (var(U)) is extrapolated back to var(U) = 0 so as to estimate 𝛽X .

More broadly, we assume a general regression model with a covariate X measured with classical measurement errorU, that is, we observe X* = X +U. Further covariates Z without measurement error may be included in the model (seeSection 3.1.2). The parameters of interest are denoted by the vector 𝛽X . Given data (Yi,X∗

i ,Zi)ni=1, the unadjusted estimator

ignoring measurement error is denoted by 𝛽∗X [(Yi,X∗i ,Zi)n

i=1].The simulation and extrapolation steps then proceed as follows:Simulation step: For each value of a fixed grid of values s1, … , sm (≥0), B new pseudo-datasets are simulated by

X∗𝑖𝑏(sk) = X∗

i +√

skvar(U)U𝑖𝑏𝑘, i = 1,… ,n; b = 1,… ,B; k = 1,… ,m,where Uibk are independent identically distributedstandard normal variables. Note that the measurement error variance of X∗

𝑖𝑏(sk) is (1+ sk)var(U). For each pseudo-dataset

the unadjusted estimator is given by 𝛽∗X [(Yi,X∗𝑖𝑏(sk),Zi)n

i=1]. We denote the average over the B repetitions by 𝛽∗Xsk, which we

calculate by 𝛽∗Xsk= B−1 ∑B

b=1 𝛽∗X [(Yi,X∗

𝑖𝑏(sk),Zi)n

i=1], k = 1,… ,m. Furthermore, we set s0 = 0 and 𝛽∗Xs0= 𝛽∗X [(Yi,X∗

i ,Zi)ni=1].

Extrapolation step: To make the necessary extrapolation, we use a parametric approximation for 𝛽∗X ((1 + s)var(U)),setting it equal to a function of s denoted by G(s,Γ), where Γ denotes the parameters of the model. We estimate theparameters Γ by least squares with [𝛽∗Xsk

]mk=0 as the independent variable data and functions of [1 + sk]m

k=0 as the dependentvariables, yielding an estimator Γ̂. The estimated parametric function is then extrapolated to s = − 1, the case of nomeasurement error. The SIMEX estimator is then defined by

𝛽SIMEX = G(−1, Γ̂). (25)

Page 29: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

KEOGH et al. 29

T A B L E 4 Overview of available software for performing regression calibration

Package/procedureWebsite locationor information Notes References

Procedure rcal within theStata package merror

http://www.stata.com/merror/ See also: http://www.stat.tamu.edu/~carroll/eiv.SecondEdition/statacode.php

For generalized linear models with X* havingclassical measurement error and (a) the errorvariance is known; (b) replicate measurements ofX* are available or (c) when there is also availablean “instrumental variable” that is correlated withX and whose errors are independent of the errorsin X*.

Hardin et al116

Procedure eivreg withinStata

http://www.stata.com/manuals13/reivreg.pdf

For linear regression where X* has classicalmeasurement error and the error variance (or ratioof the error variance to total variance) is known.

Hardin et al116

NCI SAS macros https://epi.grants.cancer.gov/diet/usualintakes/macros.html

(a) For X* measured in all individuals that, aftersuitable transformation, satisfies a linearmeasurement error model, together with an X**

measured in a subsample that, after suitabletransformation, has classical measurement error.(b) For X* measured in all individuals that, aftersuitable transformation, has classicalmeasurement error. A substantial subsampleshould have at least one repeat value of X*. In theseoptions X* and X** may be univariate, bivariate ormultivariate. X* and X** may have excess zeros.

Kipnis et al22

Spiegelman SAS macro%blinplus

https://www.hsph.harvard.edu/donna-spiegelman/software/

For univariate or multivariate X* measured in allindividuals in the main study and a validationstudy where both X and X* are measured in allindividuals. X* satisfies the linear measurementerror model (2).

Rosner et al92

Spiegelman SAS macro%relibpls8

https://www.hsph.harvard.edu/donna-spiegelman/software/

For univariate or multivariate X* measured in allindividuals and repeat measurements of X*. X*

satisfies the classical measurement error model(1).

Rosner et al93

Spiegelman SAS macro%rrc

https://www.hsph.harvard.edu/donna-spiegelman/software/

For a time-varying covariate X in a Cox regressionmodel. X* satisfies the linear measurement errormodel (2). The method uses risk-set regressioncalibration for estimating the risk parameters (seeSection 5.1.4).

Liao et al107

When 𝛽X is a vector, the SIMEX estimator can be applied separately for each component.The estimator 𝛽SIMEX is consistent when the extrapolation function is correctly specified. In complex regression mod-

els, the extrapolation function is unknown and usually does not have a suitable parametrization. However, a quadraticfunction, defined as GQ(s,Γ) = 𝛾0 + 𝛾1s+ 𝛾2s2, is a good approximation in many practical applications. For the estimationof the variance of 𝛽SIMEX, three methods are available: (i) an asymptotic approach using the delta method, see Carrollet al109; (ii) a simulation-based method proposed by Stefanski and Cook112; and (iii) the nonparametric bootstrap per-forming the whole SIMEX procedure in each bootstrap sample. When the measurement error variance is estimated froman ancillary study, the bootstrap procedure includes a sample from the ancillary study and simultaneously a sample fromthe main study in each repetition. Then, the uncertainty about the measurement error variance is also addressed. TheSIMEX method is illustrated in an example using the R-package SIMEX by Lederer and Küchenhoff.113 Figure 6 dis-plays the SIMEX extrapolation curve for the regression coefficient of heart rate on log particle number concentration (ameasure of air pollution exposure) in individuals with impaired glucose metabolism or diabetes.72

The SIMEX method is attractive due to its very general applicability. It requires only a well-defined estimationprocedure leading to a consistent estimator in the case of error-free covariates. It also requires knowledge about the

Page 30: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

30 KEOGH et al.

T A B L E 5 Overview of available software for performing simulation extrapolation (SIMEX)

Package/procedureWebsite locationor information Notes References

Package simex within theR language

https://cran.r-project.org/web/packages/simex/simex.pdf

(a) Performs SIMEX for a wide range of models withX* having classical measurement error and knownerror variance. (b) Performs MC-SIMEX for a widerange of models with categorical X* havingmisclassification with a known matrix ofmisclassification probabilities. The jackknifemethod (Stefanski and Cook112) and an asymptoticapproach (Carroll et al109; Küchenhoff et al110) areavailable for estimating the standard errors of theestimated coefficients.

Cook andStefanski,108

Küchenhoffet al,111 LedererandKüchenhoff113

Procedures simex andsimexplot within theStata package merror

http://www.stata.com/merror/

Performs SIMEX for generalized linear models withcovariate X* having classical measurement errorand known error variance or repeat measurements.

Hardin et al117

Package simexaft withinthe R language

https://cran.r-project.org/web/packages/simexaft/simexaft.pdf

Performs SIMEX for the accelerated failure timemodel with covariate X* having classicalmeasurement error and known error variance orrepeat measurements. X* may be multivariate.

He et al118

measurement error model parameters, for example through an ancillary study of one of the types described in Section 4,but this is necessary for most procedures of measurement error correction. This makes the procedure attractive for com-plex models where other methods are not feasible. Furthermore, the plot of the SIMEX procedure yields a convenientdescription of the effect of measurement error on parameter estimation. The Achilles heel of the procedure is the usu-ally unknown extrapolation function. Especially for large measurement error, when extrapolation is to a more distantpoint, the procedure should be checked by simulation studies. Note that SIMEX just needs an error structure that canbe described by X* = X +U. Here, U can be correlated with Y and Z, which makes SIMEX applicable for special typesof nondifferential measurement error. SIMEX has also been extended for more complex measurement error structures;for a recent example with spatial data see Alexeff et al.114 In another example, SIMEX has been applied to adjust formeasurement error in a survival outcome, see Oh et al.115

7 SOFTWARE FOR ANALYSIS

One of the main barriers in the past to the use of the analysis methods described in Section 6 was the lack of specificsoftware for their implementation, though the situation is gradually improving. There are now available several programsor macros for performing RC, including the rcal command for generalized linear models in Stata.116 They are summarizedin Table 4. There are also now available three packages for performing SIMEX. They are summarized in Table 5. Oneperforms SIMEX for a wide range of regression models (simex in R),113 one performs SIMEX for generalized linear models(procedures simex and simexplot within the merror package in Stata),117 and one performs SIMEX for accelerated lifetimemodels (simexaft in R).118

In supplemental materials available at https://github.com/PamelaShaw/STRATOS-TG4-Guidance-Paper, we pro-vide code which demonstrates the implementation of the RC and SIMEX analyses provided in each of the dataexamples.

8 CONCLUSION

We have presented here the background and key results required for understanding the impact of measurement erroron the results of epidemiological research studies, and some relatively simple methods available to adjust for such mea-surement error in continuous covariates used in regression models. In most cases, the limiting factor for performingthese adjusted analyses will be the availability of quantitative information regarding the measurement error. However,

Page 31: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

KEOGH et al. 31

our impression is that even when such quantitative information is available, the adjusted analyses are not beingperformed.5,119

In Part 2 we will describe a variety of additional topics, many of which build on what we have covered here. In par-ticular, we will present other methods of adjusting for measurement error, such as a likelihood-based approach, multipleimputation and moment reconstruction, and Bayesian methods. We will also describe methods of adjustment for misclas-sified discrete variables and methods for adjusting estimates of distributions. Finally, we will touch on a number of moreadvanced topics, such as mixtures of Berkson and classical error and variable selection, concluding with a discussion ofhow to proceed when the information on measurement error is incomplete.

ACKNOWLEDGEMENTSThis research is supported in part by the National Institutes of Health (NIH) grants R01-AI131771 (P.A.S.), U01-CA057030(R.J.C.), NCI P30CA012197 (J.A.T.); Patient Centered Outcomes Research Institute (PCORI) Award R-1609-36207 (P.A.S.);and Natural Sciences and Engineering Research Council of Canada (NSERC) RGPIN-2019-03957 (P.G.). The statementsin this manuscript are solely the responsibility of the authors and do not necessarily represent the views of NIH, PCORI,or NSERC.

DATA AVAILABILITY STATEMENTThe OPEN Study data that illustrate the methods presented in this paper are available upon request [email protected]. The request should specify the dataset used in analyses presented in the papers by Keogh et al (2020)and Shaw et al (2020). More information about these data can be obtained at https://epi.grants.cancer.gov/past-initiatives/open/. The software code files for implementing our examples based on the OPEN study may be found online athttps://github.com/PamelaShaw/STRATOS-TG4-Guidance-Paper.

ORCIDRuth H. Keogh https://orcid.org/0000-0001-6504-3253Pamela A. Shaw https://orcid.org/0000-0003-1883-8410Paul Gustafson https://orcid.org/0000-0002-2375-5006Laurence S. Freedman https://orcid.org/0000-0002-6767-7900

REFERENCES1. Murray RP, Connett JE, Lauger GG, Voelker HT. Error in smoking measures: effects of intervention on relations of cotinine and carbon

monoxide to self-reported smoking. The Lung Health Study Research Group. Am J Public Health. 1993;83(9):1251-1257. https://doi.org/10.2105/ajph.83.9.1251.

2. Thiébaut ACM, Freedman LS, Carroll RJ, Kipnis V. Is it necessary to correct for measurement error in nutritional epidemiology? AnnIntern Med. 2007;146(1):65. https://doi.org/10.7326/0003-4819-146-1-200701020-00012.

3. Ferrari P, Friedenreich C, Matthews CE. The role of measurement error in estimating levels of physical activity. Am J Epidemiol.2007;166(7):832-840. https://doi.org/10.1093/aje/kwm148.

4. Zeger SL, Thomas D, Dominici F, et al. Exposure measurement error in time-series studies of air pollution: concepts and consequences.Environ Health Perspect. 2000;108(5):419-426. https://doi.org/10.1289/ehp.00108419.

5. Shaw PA, Deffner V, Keogh RH, et al. Epidemiologic analyses with error-prone exposures: review of current practice and recommenda-tions. Ann Epidemiol. 2018;28(11):821-828. https://doi.org/10.1016/j.annepidem.2018.09.001.

6. Carroll R, Ruppert D, Stefanski L, Crainiceanu C. Measurement Error in Nonlinear Models: A Modern Perspective. 2nd ed. Boca Raton,FL: Chapman and Hall; 2006.

7. Buonaccorsi J. Measurement Error: Models, Methods, and Applications. Boca Raton, FL: Chapman and Hall; 2010.8. Gustafson P. Measurement Error and Misclassification in Statistics and Epidemiology: Impacts and Bayesian Adjustments. Boca Raton, FL:

Chapman and Hall; 2004.9. Yi G. Statistical Analysis with Measurement Error or Misclassification. New York, NY: Springer; 2017.

10. Keogh R, White I. A toolkit for measurement error correction, with a focus on nutritional epidemiology. Stat Med. 2014;33(12):2137-2155.https://doi.org/10.1002/sim.6095.

11. Armstrong BG. Effect of measurement error on epidemiological studies of environmental and occupational exposures. Occup EnvironMed. 1998;55(10):651-656. https://doi.org/10.1136/oem.55.10.651.

12. Buzas J, Stefanski L, Tosteson T. Measurement error. In: Ahrens W, Pigeot I, eds. Handbook of Epidemiology. New York, NY: Springer;2014:1241-1282.

13. Cochran W. Errors of measurement in statistics. Dent Tech. 1968;10(4):637-666.

Page 32: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

32 KEOGH et al.

14. Chen Z, Peto R, Collins R, MacMahon S, Lu J, Li W. Serum cholesterol concentration and coronary heart disease in population with lowcholesterol concentrations. BMJ. 1991;303(6797):276-282. https://doi.org/10.1136/bmj.303.6797.276.

15. MacMahon S, Peto R, Cutler J, et al. Blood pressure, stroke, and coronary heart disease. Part 1, prolonged differences in blood pres-sure: prospective observational studies corrected for the regression dilution bias. Lancet. 1990;335(8692):765-774. https://doi.org/10.1016/0140-6736(90)90878-9.

16. Freedman LS, Carroll RJ, Wax Y. Estimating the relation between dietary intake obtained from a food frequency questionnaire and trueaverage intake. Am J Epidemiol. 1991;134(3):310-320. https://doi.org/10.1093/oxfordjournals.aje.a116086.

17. Kipnis V, Subar AF, Midthune D, et al. Structure of dietary measurement error: results of the OPEN biomarker study. Am J Epidemiol.2003;158(1):14-21. https://doi.org/10.1093/aje/kwg091.

18. Prentice RL, Sugar E, Wang C, Neuhouser M, Patterson R. Research strategies and the use of nutrient biomarkers in studies of diet andchronic disease. Public Health Nutr. 2002;5(6a):977-984. https://doi.org/10.1079/PHN2002382.

19. Spiegelman D, Logan R, Grove D. Regression calibration with heteroscedastic error variance. Int J Biostat. 2011;7(1):4-34. https://doi.org/10.2202/1557-4679.1259.

20. Chen X, Hong H, Nekipelov D. Nonlinear models of measurement errors. J Econ Lit. 2011;49:901-937. https://doi.org/10.2307/23071661.21. Lyles RH, Kupper LL. A detailed evaluation of adjustment methods for multiplicative measurement error in linear regression with

applications in occupational epidemiology. Biometrics. 1997;53(3):1008-1025. https://doi.org/10.2307/2533560.22. Kipnis V, Midthune D, Buckman DW, et al. Modeling data with excess zeros and measurement error: application to evaluating relation-

ships between episodically consumed foods and health outcomes. Biometrics. 2009;65(4):1003-1010. https://doi.org/10.1111/j.1541-0420.2009.01223.x.

23. Berkson J. Are there two regressions? J Am Stat Assoc. 1950;45(250):164. https://doi.org/10.2307/2280676.24. Oraby T, Sivaganesan S, Bowman JD, et al. Berkson error adjustment and other exposure surrogates in occupational case-control studies,

with application to the Canadian INTEROCC study. J Expo Sci Environ Epidemiol. 2018;28(3):251-258. https://doi.org/10.1038/jes.2017.2.

25. Goldman GT, Mulholland JA, Russell AG, et al. Impact of exposure measurement error in air pollution epidemiology: effect of error typein time-series studies. Environ Health. 2011;10:61. https://doi.org/10.1186/1476-069X-10-61.

26. Tooze JA, Troiano RP, Carroll RJ, Moshfegh AJ, Freedman LS. A measurement error model for physical activity level as measured bya questionnaire with application to the 1999-2006 NHANES questionnaire. Am J Epidemiol. 2013;177(11):1199-1208. https://doi.org/10.1093/aje/kws379.

27. Willett W. Nutritional Epidemiology. 3rd ed. New York, NY: Oxford University Press; 2013.28. Tielemans E, Kupper LL, Kromhout H, Heederik D, Houba R. Individual-based and group-based occupational exposure assessment:

some equations to evaluate different strategies. Ann Occup Hyg. 1998;42(2):115-119. https://doi.org/10.1016/s0003-4878(97)00051-3.29. Peters PJ, Westheimer E, Cohen S, et al. Screening yield of HIV antigen/antibody combination and pooled HIV RNA testing for acute

HIV infection in a high-prevalence population. JAMA. 2016;315(7):682-690. https://doi.org/10.1001/jama.2016.0286.30. Flegal KM, Keyl PM, Nieto FJ. Differential misclassification arising from nondifferential errors in exposure measurement. Am J Epi-

demiol. 1991;134(10):1233-1246. https://doi.org/10.1093/oxfordjournals.aje.a116026.31. Wacholder S, Dosemeci M, Lubin JH. Blind assignment of exposure does not always prevent differential misclassification. Am J Epidemiol.

1991;134(4):433-437. https://doi.org/10.1093/oxfordjournals.aje.a116105.32. Brenner H. Notes on the assessment of trend in the presence of nondifferential exposure misclassification. Epidemiology.

1992;3(5):420-427.33. Wang D, Gustafson P. On the impact of misclassification in an ordinal exposure variable. Epidemiol Methods. 2014;3(1):97-106. https://

doi.org/10.1515/em-2013-0017.34. Wang D, Shen T, Gustafson P. Partial identification arising from nondifferential exposure misclassification: how informative are data on

the unlikely, maybe, and likely exposed? Int J Biostat. 2012;8(1). https://doi.org/10.1515/1557-4679.1397.35. McInturff P, Johnson WO, Cowling D, Gardner IA. Modelling risk when binary outcomes are subject to error. Stat Med.

2004;23(7):1095-1109. https://doi.org/10.1002/sim.1656.36. Lyles RH, Tang L, Superak HM, et al. Validation data-based adjustments for outcome misclassification in logistic regression: an

illustration. Epidemiology. 2011;22(4):589-597. https://doi.org/10.1097/EDE.0b013e3182117c85.37. Frost C, Thompson SG. Correcting for regression dilution bias: comparison of methods for a single predictor variable. J R Stat Soc Ser A.

2000;163:173-189. https://doi.org/10.2307/2680496.38. Lagakos SW. Effects of mismodelling and mismeasuring explanatory variables on tests of their association with a response variable. Stat

Med. 1988;7(1–2):257-274. https://doi.org/10.1002/sim.4780070126.39. Freedman LS, Schatzkin A, Midthune D, Kipnis V. Dealing with dietary measurement error in nutritional cohort studies. J Natl Cancer

Inst. 2011;103(14):1086-1092. https://doi.org/10.1093/jnci/djr189.40. Xie SX, Wang CY, Prentice RL. A risk set calibration method for failure time regression by using a covariate reliability sample. J R Stat

Soc Ser B. 2001;63(4):855-870. https://doi.org/10.1111/1467-9868.00317.41. Prentice RL. Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika. 1982;69(2):331-342.

https://doi.org/10.1093/biomet/69.2.331.42. Bross I. Misclassification in 2 X 2 tables. Biometrics. 1954;10(4):478. https://doi.org/10.2307/3001619.43. Dosemeci M, Wacholder S, Lubin JH. Does nondifferential misclassification of exposure always bias a true effect toward the null value?

Am J Epidemiol. 1990;132(4):746-748. https://doi.org/10.1093/oxfordjournals.aje.a115716.

Page 33: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

KEOGH et al. 33

44. Weinberg CA, Umbach DM, Greenland S. When will nondifferential misclassification of an exposure preserve the direction of a trend?Am J Epidemiol. 1994;140(6):565-571. https://doi.org/10.1093/oxfordjournals.aje.a117283.

45. Keogh RH, Carroll RJ, Tooze JA, Kirkpatrick SI, Freedman LS. Statistical issues related to dietary intake as the response variable inintervention trials. Stat Med. 2016;35(25):4493-4508. https://doi.org/10.1002/sim.7011.

46. Hyslop DR, Imbens GW. Bias from classical and other forms of measurement error. J Bus Econ Stat. 2001;19(4):475-481. https://doi.org/10.1198/07350010152596727.

47. Magder LS, Hughes JP. Logistic regression when the outcome is measured with uncertainty. Am J Epidemiol. 1997;146(2):195-203. https://doi.org/10.1093/oxfordjournals.aje.a009251.

48. Tooze JA, Kipnis V, Buckman DW, et al. A mixed-effects model approach for estimating the distribution of usual intake of nutrients: theNCI method. Stat Med. 2010;29(27):2857-2868. https://doi.org/10.1002/sim.4063.

49. Mielgo-Ayuso J, Aparicio-Ugarriza R, Castillo A, et al. Physical activity patterns of the spanish population are mostly determined by sexand age: findings in the ANIBES Study. PLoS One. 2016;11(2):e0149969. https://doi.org/10.1371/journal.pone.0149969.

50. Kaaks R, Ferrari P, Ciampi A, Plummer M, Riboli E. Uses and limitations of statistical accounting for random error correlations, in thevalidation of dietary questionnaire assessments. Public Health Nutr. 2002;5(6A):969-976. https://doi.org/10.1079/PHN2002380.

51. Kaaks R, Riboli E. Validation and calibration of dietary intake measurements in the EPIC project: methodological considerations. Euro-pean Prospective Investigation into Cancer and Nutrition. Int J Epidemiol. 1997;26(suppl 1):S15-S25. https://doi.org/10.1093/ije/26.suppl_1.s15.

52. O'Brien TE, Funk GM. A gentle introduction to optimal design for regression models. Am Stat. 2003;57(4):265-267. https://doi.org/10.2307/30037294.

53. Kaaks R, Riboli E, van Staveren W. Calibration of dietary intake measurements in prospective cohort studies. Am J Epidemiol.1995;142(5):548-556. https://doi.org/10.1093/oxfordjournals.aje.a117673.

54. Subar AF, Freedman LS, Tooze JA, et al. Addressing current criticism regarding the value of self-report dietary data. J Nutr.2015;145(12):2639-2645. https://doi.org/10.3945/jn.115.219634.

55. Freedman LS, Commins JM, Willett W, et al. Evaluation of the 24-hour recall as a reference instrument for calibrating other self-reportinstruments in nutritional cohort studies: evidence from the validation studies pooling projecT. Am J Epidemiol. 2017;186(1):73-82.https://doi.org/10.1093/aje/kwx039.

56. Lampe JW, Huang Y, Neuhouser ML, et al. Dietary biomarker evaluation in a controlled feeding study in women from the Women'sHealth Initiative cohort. Am J Clin Nutr. 2017;105(2):466-475. https://doi.org/10.3945/ajcn.116.144840.

57. Michels KB, Welch AA, Luben R, Bingham SA, Day NE. Measurement of fruit and vegetable consumption with diet questionnaires andimplications for analyses and interpretation. Am J Epidemiol. 2005;161(10):987-994. https://doi.org/10.1093/aje/kwi115.

58. Keogh RH, White IR, Rodwell SA. Using surrogate biomarkers to improve measurement error models in nutritional epidemiology. StatMed. 2013;32(22):3838-3861. https://doi.org/10.1002/sim.5803.

59. Beaton GH, Milner J, Corey P, et al. Sources of variance in 24-hour dietary recall data: implications for nutrition study design andinterpretation. Am J Clin Nutr. 1979;32(12):2546-2559. https://doi.org/10.1093/ajcn/32.12.2546.

60. Spiegelman D, Schneeweiss S, McDermott A. Measurement error correction for logistic regression models with an “Alloyed GoldStandard”. Am J Epidemiol. 1997;145(2):184-196. https://doi.org/10.1093/oxfordjournals.aje.a009089.

61. Nusser SM, Beyler NK, Welk GJ, Carriquiry AL, Fuller WA, King BMN. Modeling errors in physical activity recall data. J Phys Act Health.2012;9(suppl 1):S56-S67.

62. Neuhouser ML, Di C, Tinker LF, et al. Physical activity assessment: biomarkers and self-report of activity-related energy expenditure inthe WHI. Am J Epidemiol. 2013;177(6):576-585. https://doi.org/10.1093/aje/kws269.

63. Lim S, Wyker B, Bartley K, Eisenhower D. Measurement error of self-reported physical activity levels in New York City: assessment andcorrection. Am J Epidemiol. 2015;181(9):648-655. https://doi.org/10.1093/aje/kwu470.

64. Matthews CE, Kozey Keadle S, Moore SC, et al. Measurement of active and sedentary behavior in context of large epidemiologic studies.Med Sci Sports Exerc. 2018;50(2):266-276. https://doi.org/10.1249/MSS.0000000000001428.

65. Shaw PA, McMurray R, Butte N, et al. Calibration of activity-related energy expenditure in the Hispanic Community Health Study/Studyof Latinos (HCHS/SOL). J Sci Med Sport. 2019;22(3):300-306. https://doi.org/10.1016/J.JSAMS.2018.07.021.

66. Schoeller DA. Measurement of energy expenditure in free-living humans by using doubly labeled water. J Nutr. 1988;118(11):1278-1289.https://doi.org/10.1093/jn/118.11.1278.

67. Tolonen H, Wolf H, Jakovljevic D, Kuulasmaa K, Project. EHRM. Review of Surveys for Risk Factors of Major Chronic Diseases andComparability of the Results. Section 7: Smoking; 2002. www.thi.fi/publications/ehrm/product1/section7.htm.

68. West R, Hajek P, Stead L, Stapleton J. Outcome criteria in smoking cessation trials: proposal for a common standard. Addiction.2005;100(3):299-303. https://doi.org/10.1111/j.1360-0443.2004.00995.x.

69. Rebagliato M. Validation of self reported smoking. J Epidemiol Commun Health. 2002;56(3):163-164. https://doi.org/10.1136/jech.56.3.163.

70. Koehler KA, Peters TM. New methods for personal exposure monitoring for airborne particles. Curr Environ Heal Reports.2015;2(4):399-411. https://doi.org/10.1007/s40572-015-0070-z.

71. Ozkaynak H, Xue J, Spengler J, Wallace L, Pellizzari E, Jenkins P. Personal exposure to airborne particles and metals: results from theParticle TEAM study in Riverside, California. J Expo Anal Environ Epidemiol. 1996;6(1):57-78.

Page 34: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

34 KEOGH et al.

72. Peters A, Hampel R, Cyrys J, et al. Elevated particle number concentrations induce immediate changes in heart rate variability: a panelstudy in individuals with impaired glucose metabolism or diabetes. Part Fibre Toxicol. 2015;12:7. https://doi.org/10.1186/s12989-015-0083-7.

73. Deffner V, Küchenhoff H, Breitner S, Schneider A, Cyrys J, Peters A. Mixtures of Berkson and classical covariate measurement errorin the linear mixed model: bias analysis and application to a study on ultrafine particles. Biom J. 2018;60(3):480-497. https://doi.org/10.1002/bimj.201600188.

74. Kioumourtzoglou M-A, Spiegelman D, Szpiro AA, et al. Exposure measurement error in PM2.5 health effects studies: a pooled analysisof eight personal exposure validation studies. Environ Health. 2014;13(1):2. https://doi.org/10.1186/1476-069X-13-2.

75. Coggon D, Rose G, Barker D. Epidemiology for the Uninitiated. 5th ed. London, UK: BMJ Publishing Group; 2003. http://www.bmj.com/about-bmj/resources-readers/publications/epidemiology-uninitiated.

76. Burkhauser RV, Cawley J. Beyond BMI: the value of more accurate measures of fatness and obesity in social science research. J HealthEcon. 2008;27(2):519-529. https://doi.org/10.1016/J.JHEALECO.2007.05.005.

77. Spiegelman D. Cost-efficient study designs for relative risk modeling with covariate measurement error. J Stat Plan Inference.1994;42(1–2):187-208. https://doi.org/10.1016/0378-3758(94)90196-1.

78. Reilly M. Optimal sampling strategies for two-stage studies. Am J Epidemiol. 1996;143(1):92-100. https://doi.org/10.1093/oxfordjournals.aje.a008662.

79. Holford TR, Stack C. Study design for epidemiologic studies with measurement error. Stat Methods Med Res. 1995;4(4):339-358. https://doi.org/10.1177/096228029500400405.

80. Rosner B, Willett WC, Spiegelman D. Correction of logistic regression relative risk estimates and confidence intervals for systematicwithin-person measurement error. Stat Med. 1989;8(9):1051-1069. https://doi.org/10.1002/sim.4780080905.

81. Devine OJ, Smith JM. Estimating sample size for epidemiologic studies: the impact of ignoring exposure measurement uncertainty. StatMed. 1998;17(12):1375-1389. https://doi.org/10.1002/(SICI)1097-0258(19980630)17:12%3C1375::AID-SIM857%3E3.0.CO;2-D.

82. McKeown-Eyssen GE, Tibshirani R. Implications of measurement error in exposure for the sample sizes of case-control studies. AmJ Epidemiol. 1994;139(4):415-421. https://doi.org/10.1093/oxfordjournals.aje.a117014.

83. White E, Kushi LH, Pepe MS. The effect of exposure variance and exposure measurement error on study sample size: implications forthe design of epidemiologic studies. J Clin Epidemiol. 1994;47(8):873-880. https://doi.org/10.1016/0895-4356(94)90190-2.

84. Tosteson TD, Buzas JS, Demidenko E, Karagas M. Power and sample size calculations for generalized regression models with covariatemeasurement error. Stat Med. 2003;22(7):1069-1082. https://doi.org/10.1002/sim.1388.

85. Self SG, Mauritsen RH. Power/sample size calculations for generalized linear models. Biometrics. 1988;44(1):79. https://doi.org/10.2307/2531897.

86. Yang Q, Liu T, Kuklina EV, et al. Sodium and potassium intake and mortality among US adults. Arch Intern Med. 2011;171(13):1183-1191.https://doi.org/10.1001/archinternmed.2011.257.

87. Hsieh FY, Lavori PW. Sample-size calculations for the Cox proportional hazards regression model with nonbinary covariates. ControlClin Trials. 2000;21(6):552-560. https://doi.org/10.1016/S0197-2456(00)00104-5.

88. Subar AF, Kipnis V, Troiano RP, et al. Using intake biomarkers to evaluate the extent of dietary misreporting in a large sample of adults:the OPEN study. Am J Epidemiol. 2003;158(1):1-13. https://doi.org/10.1093/aje/kwg092.

89. Selected OPEN Data. https://epi.grants.cancer.gov/past-initiatives/open/. Accessed September 5, 2019.90. Carroll RJ, Stefanski LA. Approximate quasi-likelihood estimation in models with surrogate predictors. J Am Stat Assoc.

1990;85(411):652-663. https://doi.org/10.1080/01621459.1990.10474925.91. Armstrong B. Measurement error in the generalised linear model. Commun Stat Simul Comput. 1985;14(3):529-544. https://doi.org/10.

1080/03610918508812457.92. Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for measurement

error: the case of multiple covariates measured with error. Am J Epidemiol. 1990;132(4):734-745. https://doi.org/10.1093/oxfordjournals.aje.a115715.

93. Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for randomwithin-person measurement error. Am J Epidemiol. 1992;136(11):1400-1413. https://doi.org/10.1093/oxfordjournals.aje.a116453.

94. Bartlett JW, De Stavola BL, Frost C. Linear mixed models for replication data to efficiently allow for covariate measurement error. StatMed. 2009;28(25):3158-3178. https://doi.org/10.1002/sim.3713.

95. Spiegelman D, Carroll RJ, Kipnis V. Efficient regression calibration for logistic regression in mainstudy/internal validation study designs with an imperfect reference instrument. Stat Med. 2001;20(1):139-160.https://doi.org/10.1002/1097-0258(20010115)20:1<139::AID-SIM644>3.0.CO;2-K.

96. White I, Frost C, Tokunaga S. Correcting for measurement error in binary and continuous variables using replicates. Stat Med.2001;20(22):3441-3457. https://doi.org/10.1002/sim.908.

97. Fuller W. Measurement Error Models. New York: John Wiley & Sons, Inc; 1987.98. Prentice RL, Huang Y, Kuller LH, et al. Biomarker-calibrated energy and protein consumption and cardiovascular disease risk among

postmenopausal women. Epidemiology. 2011;22(2):170-179. https://doi.org/10.1097/EDE.0b013e31820839bc.99. Freedman LS, Midthune D, Carroll RJ, et al. Using regression calibration equations that combine self-reported intake and biomarker

measures to obtain unbiased estimates and more powerful tests of dietary associations. Am J Epidemiol. 2011;174(11):1238-1245. https://doi.org/10.1093/aje/kwr248.

100. Efron B, Tibshirani R. An Introduction to the Bootstrap. Boca Raton, FL: Chapman and Hall; 1994.

Page 35: stratos-initiative.org...2 KEOGHetal. 11BiometryResearchGroup,Divisionof CancerPrevention,NationalCancer Institute,Bethesda,Maryland,USA 12BiostatisticsandBiomathematicsUnit

KEOGH et al. 35

101. Collaboration FS. Correcting for multivariate measurement error by regression calibration in meta-analyses of epidemiological studies.Stat Med. 2009;28(7):1067-1092. https://doi.org/10.1002/sim.3530.

102. Murad H, Freedman LS. Estimating and testing interactions in linear regression models when explanatory variables are subject to classicalmeasurement error. Stat Med. 2007;26(23):4293-4310. https://doi.org/10.1002/sim.2849.

103. Murad H, Kipnis V, Freedman LS. Estimating and testing interactions when explanatory variables are subject to non-classical measure-ment error. Stat Methods Med Res. 2016;25(5):1991-2013. https://doi.org/10.1177/0962280213509720.

104. Cook NR, Obarzanek E, Cutler JA, et al. Joint effects of sodium and potassium intake on subsequent cardiovascular disease: the Trialsof Hypertension Prevention follow-up study. Arch Intern Med. 2009;169(1):32-40. https://doi.org/10.1001/archinternmed.2008.523.

105. Whittemore AS. Errors-in-variables regression using Stein estimates. Am Stat. 1989;43(4):226-228. https://doi.org/10.1080/00031305.1989.10475663.

106. Shaw PA, Prentice RL. Hazard ratio estimation for biomarker-calibrated dietary exposures. Biometrics. 2012;68(2):397-407. https://doi.org/10.1111/j.1541-0420.2011.01690.x.

107. Liao X, Zucker DM, Li Y, Spiegelman D. Survival analysis with error-prone time-varying covariates: a risk set calibration approach.Biometrics. 2011;67(1):50-58. https://doi.org/10.1111/j.1541-0420.2010.01423.x.

108. Cook JR, Stefanski LA. Simulation-extrapolation estimation in parametric measurement error models. J Am Stat Assoc.1994;89(428):1314-1328. https://doi.org/10.1080/01621459.1994.10476871.

109. Carroll RJ, Kuchenhoff H, Lombard F, Stefanski LA. Asymptotics for the SIMEX estimator in nonlinear measurement error models. J AmStat Assoc. 1996;91(433):242. https://doi.org/10.2307/2291401.

110. Küchenhoff H, Lederer W, Lesaffre E. Asymptotic variance estimation for the misclassification SIMEX. Comput Stat Data Anal.2007;51(12):6197-6211. https://doi.org/10.1016/j.csda.2006.12.045.

111. Küchenhoff H, Mwalili SM, Lesaffre E. A general method for dealing with misclassification in regression: the misclassification SIMEX.Biometrics. 2006;62(1):85-96. https://doi.org/10.1111/j.1541-0420.2005.00396.x.

112. Stefanski LA, Cook JR. Simulation-extrapolation: the measurement error jackknife. J Am Stat Assoc. 1995;90(432):1247-1256. https://doi.org/10.2307/2291515.

113. Lederer W, Küchenhoff H. Simex: SIMEX- and MCSIMEX-algorithm for measurement error models; 2013. http://cran.r-project.org/package=simex.

114. Alexeeff SE, Carroll RJ, Coull B. Spatial measurement error and correction by spatial SIMEX in linear regression models when usingpredicted air pollution exposures. Biostatistics. 2016;17(2):377-389. https://doi.org/10.1093/biostatistics/kxv048.

115. Oh EJ, Shepherd BE, Lumley T, Shaw PA. Considerations for analysis of time-to-event outcomes measured with error: bias and correctionwith SIMEX. Stat Med. 2018;37(8):1276-1289. https://doi.org/10.1002/sim.7554.

116. Hardin JW, Schmeidiche H, Carroll RJ. The regression-calibration method for fitting generalized linear models with additive measure-ment error. Stata J. 2003;3(4):361-372. https://doi.org/10.1177/1536867X0400300406.

117. Hardin JW, Schmiediche H, Carroll RJ. The simulation extrapolation method for fitting generalized linear models with additivemeasurement error. Stata J. 2003;3(4):373-385. https://doi.org/10.1177/1536867X0400300407.

118. He W, Yi GY, Xiong J. Accelerated failure time models with covariates subject to measurement error. Stat Med. 2007;26(26):4817-4832.https://doi.org/10.1002/sim.2892.

119. Brakenhoff TB, Mitroiu M, Keogh RH, Moons KGM, Groenwold RHH, van Smeden M. Measurement error is often neglected in medicalliterature: a systematic review. J Clin Epidemiol. 2018;98:89-97. https://doi.org/10.1016/j.jclinepi.2018.02.023.

How to cite this article: Keogh RH, Shaw PA, Gustafson P, et al. STRATOS guidance document onmeasurement error and misclassification of variables in observational epidemiology: Part 1—Basic theory andsimple methods of adjustment. Statistics in Medicine. 2020;1–35. https://doi.org/10.1002/sim.8532