17
MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African Regional consortium

MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African

Embed Size (px)

Citation preview

Page 1: MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African

MISSING DATA IN THE INFECTIOUS DISEASES

INSTITUTE CLINIC DATABASE

Agnes N Kiragga

East Africa IeDEA investigators’ meeting

4-5th May 2010

East African

Regional consortium

Page 2: MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African

2

Objectives

• Describe level of missing data for key variables

• Factors associated with missing for patients on Antiretroviral therapy (ART)

• Assess missing data assumptions in observational databases

Page 3: MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African

3

Assumptions of missing data

• “missing completely at random” [MCAR] - not dependent on anything important– blood sample lost or not taken in error

• “missing at random” [MAR]• - dependent only on other measured factors, not on the

missing (unobserved) value– study specifies blood pressure below a threshold, so after

registering a high value, patient is withdrawn [blood pressure at this visit]

• “missing not at random” [MNAR]• related to the missing outcome itself

– patient withdrew from study because they "didn't feel well“

Page 4: MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African

4

Study population04/2000 – 04/2010

Registered 23121Active 15070

Non-ART13310

ART9811

DART 300

Before 20051043

After 20058468

9511

Page 5: MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African

5

Variables Data recorded

Recorded at every clinic visit

Recorded only when event occurs

Demographic Gendera

Date of birtha

Weight Heighta

XXXX

ClinicalWHO stage Karnofsky score

XX

Laboratory CD4 T-cell CBC (Hemoglobin Lymphocytes )

XX

Other variablesOpportunistic infectionsToxicityART regimenReason for ART switch/stopART Switch dateAdherence score

XXXXXX

Page 6: MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African

6

Source of CD4 data

• Electronic download (86146 (95%)

• Recorded (3085 (5%))

Page 7: MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African

7

Missing baseline variablesVariable N = 9511

Number missing (%)

Age 0 (0)

Gender 0 (0)

WHO clinical stage 0 (0)

Weight (Kg) 13 (0.1)

Height (cm) 1032 (10.8)

CD4 + count (cell/μL)1 3126 (32.8)

CD4 + count (cell/μL)2 1641 (17.2)

CD4 + count (cell/μL)3 1350 (14.2)

ART regimen 0 (0)

Note: 1=3mth pre-ART, 2=6mths pre-ART, 3=12mths pre-ART

Page 8: MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African

8

Number of missing baseline variables

Note: a variables include weight, height and CD4 count

Year ofART start

Number of missing variables, n(%)

0 1 2

≤ 2004 198 (3.5) 648 (19.2) 197 (47.4)

2005 1878 (32.9) 848 (25.1) 100 (24.0)

2006 971 (17.0) 419 (12.4) 24 (5.8)

2007 1154 (20.2) 627 (18.6) 30 (7.2)

2008 739 (12.9) 379 (11.2) 33 (7.9)

2009 694 (12.1) 385 (11.4) 26 (6.3)

2010 81 (1.4) 74 (2.2) 6 (1.4)

Total 5715 (100) 3380 (100) 416 (100)

Page 9: MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African

9

Factors associated with missing baseline CD4 count

No association with gender, age, weight

Variable Missing N=3167

NOT missing N=6344

p

Missing baseline height, n(%)

411 (13) 621 (9.8) <0.0001

ART regimen; Nevirapine Efavirenz PI Other

1798 (56.8)1143 (36.0) 154 (4.9) 72 (2.3)

3698 (58.3)2425 (38.2) 83 (1.3) 138 (2.2)

<0.0001

Year of ART initiation; ≤2004 2005 2006 2007 2008 2009 2010

562 (17.7)720 (22.7)415 (13.2)614 (19.4)390 (12.3)387 (12.2) 79 (2.5)

481 (7.6) 2106 (33.2) 999 (15.7) 1197 (18.9) 761 (12.0) 718 (11.3) 82 (1.3)

<0.0001

OUTCOMES

Study status Active Dead Lost Transferred Missing

2365 (33.8) 199 (24.0) 76 (65.5) 354 (33.1) 173 (34.0)

4623 (72.9) 629 (9.9) 40 (0.6) 716 (11.3) 336 (5.3)

<0.0001

Page 10: MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African

10

CD4 counts at follow-up visits• CD4 tested 6 monthly (± 2 months)• Exclude baseline CD4 counts• Complete CD4 data No. of cd4 test expected >= No. total cd4 Given duration on ART counts observed

• Missing CD4 data No. of cd4 test expected ≠ No. total cd4 Given duration on ART counts observed

• 1423 (15%)- insufficient follow-up • 8088 (85%) assessed for missing CD4

Page 11: MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African

11

Categorization of follow-up CD4 data (N= 8088)

Categorization | Freq. Percent -------------------------------------+------------------------complete baseline+ complete follow-up | 2,878 35.58 complete baseline + missing follow-up | 2,529 31.27 missing baseline + complete follow-up | 1,315 16.26 missing baseline + missing follow-up | 1,366 16.89 ------------------------------- -----+------------------------

Total | 8,088 100.00

• Complete baseline + complete f/up + cd4 testing + timely cd4 tests = 864 (10.7%)

•Included all nested research cohort patients

Page 12: MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African

12

n=995 n=2487 n=1174 n=1555 n=960 n=917

Categorization of follow-up CD4 data year of ART initiation for patients with atleast 6 months follow-up

0%

20%

40%

60%

80%

100%

<=2004 2005 2006 2007 2008 2009

Page 13: MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African

13

Validation of incident Post-ART Tuberculosis cases

• Tuberculosis most common opportunistic infection (rate (95% CI) 2.79 (2.45-3.16)) in first 24 months after ART initiation

• Merged flagged TB cases with TB drug database

•Identified patients on TB treatment

• 334 incident post-ART cases

Page 14: MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African

14

Log rank P<0.435

Assumption 1

Baseline CD4 data Missing completely at random

Probability of development of Tuberculosis (TB) by baseline CD4 data

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0 .5 1 1.5 2 2.5analysis time

Complete baseline CD4 data Missing baseline_ CD4 data

Page 15: MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African

15

0.0

00

.05

0.1

00

.15

0.2

00

.25

0.3

0

0 .5 1 1.5 2 2.5analysis time

Missing follow cd4 data Complete follow up cd4 data

Assumption 2

Baseline CD4 data missing at random

Probability of development of Tuberculosis (TB) by follow-up CD4 data

Page 16: MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African

16

Preliminary Insights from analysis

• Reconcile local and IeDEA wide analyses• Baseline CD4 missing completely at

random (MCAR)• Follow-up CD4 data missing at random• Ignoring the missing data will lead to

biased estimates of ART• Strategies needed to identify patterns and

mechanisms of missing data in observational data prior to analysis

Page 17: MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African

17

Planned analyses• missing data and other HIV outcomes e.g.

• immune response• Incidence of other opportunistic infections• toxicity• treatment changes/switches

• Strength of nested research cohort can be used to validate imputed data in large database

• CD4 trajectories versus mortality-estimate the distribution of CD4 marker trajectories and the distribution of log survival time using mixed-effects models, measuring time from the first pre-HAART CD4