Upload
osborn-bradford
View
215
Download
1
Embed Size (px)
Citation preview
Sample size and power estimation when covariates are measured with error
Michael WallaceLondon School of Hygiene and Tropical Medicine
Outline
1. Measurement error – what is it and what problems can it cause?
2. What can we do about it?
3. The problem of power – introducing autopower
Measurement error – a crash course
Often impossible to measure covariates accurately:
e.g. Dietary intake, blood pressure, weight
Instead, we have error-prone observations
How these relate to the underlying true values is our 'measurement error model'
Common model: ”classical” error:
Observed = True + Measurement Error
...but other models are available.
Why does it matter?
Simple linear regression:
Classical measurement error:
Why does it matter?
Simple linear regression:
Classical measurement error:
Regress Y on W to obtain an estimate of
where
Why does it matter?
Simple linear regression:
Classical measurement error:
Regress Y on W to obtain an estimate of
where
What can we do about it?
Need additional data to tell us about the measurement error Validation (accurate measurements on some) Replication (multiple measurements)
Validation 'best', but replication more practical
Huge variety of 'correction methods' available to try and remove bias induced by measurement error.
Two that are already available in Stata: Regression calibration (Stata command: rcal) Simulation extrapolation (Stata command: simex) ...but these don't produce consistent effect estimates in general.
Conditional Score
If there is measurement error, then solving estimating equations as normal will give inconsistent effect estimates.
Conditional score solves modified estimating equations to avoid this.
Unlike regression calibration and simulation extrapolation, it produces consistent effect estimates for a range of models, including logistic regression.
We have produced cscore for Stata to implement this method in the case of logistic regression.
The problem of power
Measurement error hits us with a 'double whammy':
Bias
Wider confidence intervals
Bias will often remain a problem even if a correction method is used.
Sample size calculations generally impossible.
Simulation studies only recourse.
autopower aims to remove the leg work.
autopower in brief
autopower simulates datasets that suffer from measurement error.
Then sees how methods perform on these datasets.
Variety of methods available:
'naïve', rcal, simex, cscore Assumes:
Univariate logistic regression Subjects are measured either once or twice
Example: specific design
“How well should regression calibration perform on this dataset?”
Example: estimating sample size
“What sample size do I need to achieve 80% power?”
Example: cost minimization
“Obtaining second observations is expensive, can I save money by considering a design where not everyone is measured twice?”
User specifies how much more it costs to measure a subject twice rather than once.
autopower then searches the 'r1-r2' space:
r1 = subjects measured once
r2 = subjects measured twice
Various tricks for practical speed.
References
General overview: Carroll, R. J., D. Ruppert, L. K. Stefanski, and C. M. Crainiceanu. 2006. Measurement Error in Nonlinear Models: A Modern Perspective, Second Edition. Chapman & Hall/CRC
Regression calibration: Stefanski, L. A., and R. J. Carroll. 1987. Conditional scores and optimal scores in generalized linear measurement error models. Biometrika 74: 703–716.
Simulation extrapolation: Cook J R and Stefanski L A. Simulation-extrapolation estimation in parametric measurement error models. Journal of the American Statistical Association, 89:1314–1328, 1994.
Conditional score: Carroll, R. J., and L. A. Stefanski. 1990. Approximate quasilikelihood estimation in models with surrogate predictors. Journal of the American Statistical Assocation 85: 652–63.