32
Capture-recapture and Disease Registers Geraldine Surman Matthias Pierce 15 March 2010

Capture Recapture Mar 10

Embed Size (px)

Citation preview

Page 1: Capture Recapture Mar 10

Capture-recapture and

Disease Registers

Geraldine Surman Matthias Pierce15 March 2010

Page 2: Capture Recapture Mar 10

Capture-recapture

• What is C-Rc?• What is C-Rc for?• Methods• How useful is it?

Page 3: Capture Recapture Mar 10

What is Capture-recapture?

• Ecologists• Capture – mark - release – recapture• % marked used to estimate population size• Epidemiology – increasing use, methods

developing

Page 4: Capture Recapture Mar 10

Disease Registers

• List of cases• Multiple sources of ascertainment (case

identification)

Page 5: Capture Recapture Mar 10

Assumptions

• Closed model• ‘Captures’ are matchable• Independence of sources• Homogeneity/equal catchability

Page 6: Capture Recapture Mar 10

Fitting the assumptions to the register

• Closed model – location at birth used to select– Followed up till age 5 yrs, even if move out

• Matchability – personal identifiers used• Every child with CP born in area should be

equally catchable by any one source NB severity

• Independence – unlikely to be met!

Page 7: Capture Recapture Mar 10

4Child - The Sources

• 22 different source types• 2337 CP notifications, 1984-2003 births– Each child notified up to 5 times

• Sources with < 10 notifications excluded• Three analytic sources constructed– 1. Child Health Information Systems (45%)– 2. everything else except (40%)– 3. Health Visitors (15%)

Page 8: Capture Recapture Mar 10

2 sources of cases

Source 1 Source 2

n11n10 n01

n00

Page 9: Capture Recapture Mar 10

Analytical methods used – estimators2 source scenario

  Source 2  

    1 0  

Source 1 1 n11 n10 n1+

0 n01 n00 n0+

    n+1 n+0 N

Page 10: Capture Recapture Mar 10

Analytical methods used - estimators

• Taking assumption of independence of sources –

Rearrange to get (Lincoln-Petersen) estimate:

• For small samples: Chapman’s two sample nearly unbiased estimates

1001

0011

.

.1

nn

nnOR

1)1(

)1).(1(

11

1001

n

nnnchap

11

100100

.

n

nnn

Page 11: Capture Recapture Mar 10

3 sourced scenario

Source A

Yes No

Source B Source B

Yes No Yes No

Source C Yes a b c d

No e f g x

Page 12: Capture Recapture Mar 10

3 sources of cases

Source 1

a

bc

xd

ef g

Source 2

Source 3

Page 13: Capture Recapture Mar 10

Assessing dependenceIf 3 sources are available, two sourced dependence can be assessed and accounted for by modeling the expected frequencies in a contingency table with a log linear model:

Where is the first order effect of source A at level i

is the second order (interaction) effect of sources A at level i and B at level j

ln BCij

ACij

ABij

Ck

Bj

AiijkF

Ai

ABij

Page 14: Capture Recapture Mar 10

Log linear modeling

•No interaction: sources are independent (1 model)

•Interaction between 2 sources only (3 models)

•Interactions between pairs of sources (3 models)

•Interactions between all sources 2 by 2 (1 model - saturated)

Ck

Bj

AiijkF ln

ABij

Ck

Bj

AiijkF ln

ACij

ABij

Ck

Bj

AiijkF ln

ln BCij

ACij

ABij

Ck

Bj

AiijkF

Page 15: Capture Recapture Mar 10

Assessing model fit

• Using Gsquared goodness-of-fit statistic:

• Then using the parsimony of the model (simple is best!):

)/ln(22jijj ExpObsObsG

.).)(2/(ln

.).(22

2

fdNGBIC

fdGAIC

obs

AIC = Akaike information criterionBIC=Bayesian information criterion

Page 16: Capture Recapture Mar 10

Chapman’s – 4Child data

• -ve dependence between each pair (s1/s2, s1/s3 and s2/s3)

• Pop n estimates higher than observed• Fit with assumptions not good

Page 17: Capture Recapture Mar 10

Further Analytical methods

• Log linear modelling fitted to the data in a contingency table - to account for dependence (interaction)between sources

• Backwards elimination• Akaike Information Criterion (AIC), Bayesian

Information Criterion (BIC) evaluates the likelihood and the parsimony of the model.

Page 18: Capture Recapture Mar 10

Further Analytical methods (2)

• Heterogeneity – stratification• Sex• Severity of impairment• County of birth• Birthweight

• CIs allowing for uncertainty of observed number of cases

• Chi square on difference between subgroups• Revised birth prevalence estimates

Page 19: Capture Recapture Mar 10

Illustration

• Use the Poisson command in STATA for independence log-linear model, maximum likelihood, main effects only.

• poisson n cr1 cr2 cr3 • Results: coeff = 5.391907• exp β0 = 219.6• est N = 1355 + 220 = 1575

Page 20: Capture Recapture Mar 10

Illustration (2)

• Fit the most complex log-linear model. xi: poisson n (i.cr1 i.cr2 i.cr3) i.cr1*i.cr2

i.cr1*i.cr3 i.cr2*i.cr3• Results indicate that interactions cr1.cr3, and

cr2.cr3 may be dropped• Running the post-estimation command,

predict, gives a population estimate of 1833

Page 21: Capture Recapture Mar 10

Illustration (3)

• xi: nestreg, qui lr: poisson n (i.cr1 i.cr2 i.cr3) i.cr1*i.cr2 i.cr1*i.cr3 i.cr2*i.cr3 p-values confirm dropping cr1.cr3, and cr2.cr3 (marginal)

• nestreg was rerun alternating the 2-way interactions in last position to find the lowest AIC and BIC model – keep cr2.cr3

Page 22: Capture Recapture Mar 10

Illustration (4)

• . xi: poisson n (i.cr1 i.cr2 i.cr3) i.cr1*i.cr2 i.cr2*i.cr3

• run estat gof (post estimation command)• Results: Goodness-of-fit chi2 = .0934347• Prob > chi2(1) = 0.7599• So H0: model fits, p-value = no evidence of lack

of fit

Page 23: Capture Recapture Mar 10

Illustration (5)

• . xi: poisson n (i.cr1 i.cr2 i.cr3) i.cr1*i.cr2 i.cr2*i.cr3

• Post estimation command, predict, yields a population estimate of 1860

Page 24: Capture Recapture Mar 10

Confidence Intervals

Var(estN) = (estn0)2x(SE)2 + 1355 x 505/ 1860

= 5052 x 0.17327732 + 1355 x 505/1860Var(estN) = 7370 + 368

= 7739SE(estN) = √7739

= 88.0estN ±1.96 x SE(estN) = 1860 ±1.96 x 88

= 1688 to 2032 The STATA command nlcom gives similar CIs. nlcom 1355+exp(_b[_consGives: = 1688 to 2031

Page 25: Capture Recapture Mar 10

Results and Conclusions

• Overall significant % uncaptured individuals• Severe motor impairment low % missing• Two counties 6-10% missing

Page 26: Capture Recapture Mar 10

Results and Conclusions (2)

• ‘Corrected’ birth prevalence estimates higher than observed but supported a decline in CP rate over time

• Good ascertainment is possible where resources are focussed

Page 27: Capture Recapture Mar 10

Estimating the UK-wide prevalence of gastroschisis using 3 sources and C/R

• 3 sources:– UK Obstetric Surveillance System

(UKOSS)– British Assocation of Paediatric

Surgeons (BAPS)– British Isles Network Of Congenital

Anomalies Registers (BINOCAR) <50% coverage

2 different analyses:- Binocar areas and non-binocar areas

Page 28: Capture Recapture Mar 10

BINOCAR area analysis

• 2 sourced capture recapture analysis between all cases caught by BINOCAR and UKOSS areas– Confidence intervals using goodness of fit statistic

Gsquared• 3 sourced analysis for all livebirth cases to

assess independence of sources

Page 29: Capture Recapture Mar 10

Non-BINOCAR area analysis

• Two sourced analysis on livebirth data only• UKOSS underascertainment estimated• Total estimated cases extrapolated using– CI’s calculated using bootstrapping

Page 30: Capture Recapture Mar 10

Calculating prevalence estimate

• Need to accomodate two sources of variation – that in the prevalence estimate and that in the c/r estimate

• To combine these two sources, use techniques borrowed from multiple imputation.

• After N bootstraps:VB = Variance in the bootstrapped

estimates of the incidenceσ2

i=the square of the standard error of the incidence in the ith bootstrapB

M

i

i

Tot VM

M

MV

11

2

Page 31: Capture Recapture Mar 10

Findings

• Amongst regions where BINOCAR operates, C/R did not add many cases (.01%). – Not surprising since BINOCAR caught >95% of

cases and 88% of cases were caught by two or more sources

• In non-BINOCAR areas, an extra 15% of cases were added after C/R– 49% caught by both registers

Page 32: Capture Recapture Mar 10

Would I use C-Rc again?

• % individuals notified only once• C-Rc– Much more complex initially– Once set up, easy to repeat– Provides population estimates– Comparable with other studies

• Yes, I would use it again, always emphasising the range of estimates according to CIs