Sample size and power estimation when covariates are measured with error Michael Wallace London School of Hygiene and Tropical Medicine

Sample size and power estimation when covariates are measured with error

Michael WallaceLondon School of Hygiene and Tropical Medicine

Outline

1. Measurement error – what is it and what problems can it cause?

2. What can we do about it?

3. The problem of power – introducing autopower

Measurement error – a crash course

Often impossible to measure covariates accurately:

e.g. Dietary intake, blood pressure, weight

Instead, we have error-prone observations

How these relate to the underlying true values is our 'measurement error model'

Common model: ”classical” error:

Observed = True + Measurement Error

...but other models are available.

Why does it matter?

Simple linear regression:

Classical measurement error:

Why does it matter?



Regress Y on W to obtain an estimate of

where

Why does it matter?



Regress Y on W to obtain an estimate of

where

What can we do about it?

Need additional data to tell us about the measurement error Validation (accurate measurements on some) Replication (multiple measurements)

Validation 'best', but replication more practical

Huge variety of 'correction methods' available to try and remove bias induced by measurement error.

Two that are already available in Stata: Regression calibration (Stata command: rcal) Simulation extrapolation (Stata command: simex) ...but these don't produce consistent effect estimates in general.

Conditional Score

If there is measurement error, then solving estimating equations as normal will give inconsistent effect estimates.

Conditional score solves modified estimating equations to avoid this.

Unlike regression calibration and simulation extrapolation, it produces consistent effect estimates for a range of models, including logistic regression.

We have produced cscore for Stata to implement this method in the case of logistic regression.

The problem of power

Measurement error hits us with a 'double whammy':

Bias

Wider confidence intervals

Bias will often remain a problem even if a correction method is used.

Sample size calculations generally impossible.

Simulation studies only recourse.

autopower aims to remove the leg work.

autopower in brief

autopower simulates datasets that suffer from measurement error.

Then sees how methods perform on these datasets.

Variety of methods available:

'naïve', rcal, simex, cscore Assumes:

Univariate logistic regression Subjects are measured either once or twice

Example: specific design

“How well should regression calibration perform on this dataset?”

Example: estimating sample size

“What sample size do I need to achieve 80% power?”

Example: cost minimization

“Obtaining second observations is expensive, can I save money by considering a design where not everyone is measured twice?”

User specifies how much more it costs to measure a subject twice rather than once.

autopower then searches the 'r1-r2' space:

r1 = subjects measured once

r2 = subjects measured twice

Various tricks for practical speed.

References

General overview: Carroll, R. J., D. Ruppert, L. K. Stefanski, and C. M. Crainiceanu. 2006. Measurement Error in Nonlinear Models: A Modern Perspective, Second Edition. Chapman & Hall/CRC

Regression calibration: Stefanski, L. A., and R. J. Carroll. 1987. Conditional scores and optimal scores in generalized linear measurement error models. Biometrika 74: 703–716.

Simulation extrapolation: Cook J R and Stefanski L A. Simulation-extrapolation estimation in parametric measurement error models. Journal of the American Statistical Association, 89:1314–1328, 1994.

Conditional score: Carroll, R. J., and L. A. Stefanski. 1990. Approximate quasilikelihood estimation in models with surrogate predictors. Journal of the American Statistical Assocation 85: 652–63.

Documents

Sample size and power estimation when covariates are measured with error Michael Wallace London School of Hygiene and Tropical Medicine