68
The Case for Clinical Repositories as a Data Source for Research Clem McDonald, MD Director Lister Hill National Center for Biomedical Communications National Library of Medicine National Advisory Research Resources Council (NARRC) Meeting January 18, 2007

AAVLD Informatics Committee: Data Standardization in the

  • Upload
    jared56

  • View
    180

  • Download
    5

Embed Size (px)

Citation preview

Page 1: AAVLD Informatics Committee: Data Standardization in the

The Case for Clinical Repositories as a Data Source for Research

Clem McDonald, MDDirector

Lister Hill National Center for Biomedical Communications

National Library of Medicine

National Advisory Research Resources Council (NARRC) MeetingJanuary 18, 2007

Page 2: AAVLD Informatics Committee: Data Standardization in the

2

Clinical Repositories

What do they do?

What could they do?

Page 3: AAVLD Informatics Committee: Data Standardization in the

3

Why Clinical Repositories

• Many sources of electronic data in an institution• Labs, radiology, Pharmacy, MD orders, EKGs,

Dictated reports, Radiology images, etc, etc, etc• Most of these sources can deliver this data via HL7

messages to another computer

• Repository is a database that provides a unified and simple access to all of this data in a unified view.

Page 4: AAVLD Informatics Committee: Data Standardization in the

4

What Data is Available Within the Institution

• Lab data (almost always electronic)• Medication orders in patients • Radiology reports (text)• Pathology reports (text)• Dictation (discharge summary)• EKGs• Cardiac echoes• Endoscopy • Obstetrical Ultrasound• Nursing Observation

Page 5: AAVLD Informatics Committee: Data Standardization in the

5

More Data – “Outside” the Institution

• Event Data – Coded diagnoses and procedures

• Tumor registries – whole country

• Cardiology data bases (ACC, ATS, etc.) – whole country

• Federal ESRD -base

• Out patient medications – From pharmacy benefit managers. Rx.Hub

• More

Page 6: AAVLD Informatics Committee: Data Standardization in the

6

Potential Availability Still More

• Medicaid – procedures, diagnoses and drug use

• Medicare – Diagnoses, procedures and (now) medication use

• Lots of special federal collection instruments – Nursing home, disability, Medicare introduction, etc.

Page 7: AAVLD Informatics Committee: Data Standardization in the

7

Why codes for observations so important

• The observation is not defined by the field as in typical in research and policy data bases

Page 8: AAVLD Informatics Committee: Data Standardization in the

8

Flat Data Set(Analytic Conceptualization)

Pat ID Name surgery date

Hgb DBP # of BPU

Bypass Minute

Cholest

1234-5 Doe Jane 12May95 13 95 3 80 180

9999-3 Jones T 1Aug95 12.5 88 2 90 230

8888-3 Doe Sam 4June95 16 78 0 80 205

Page 9: AAVLD Informatics Committee: Data Standardization in the

9

Why observation codes important

• The observation ID defined by a “pointer” to a master table – as follows

Page 10: AAVLD Informatics Committee: Data Standardization in the

10

Stacked Data Set Application Conceptualization

Pt ID Relevant Date

Observation ID Value Units Normal Rang

Place Observer

Doe J 12-May-95 Hemoglobin 13 mg/dl 12.5-15 St Francis Dr Smith

Doe J 12-May-95 Hemoglobin 11.5 mg/dl 12.5-15 St Francis Dr Smith

Doe J 12-May-95 Dias BP 95 mm/Hg 80-140 St Francis Dr Smith

Doe J 12-May-95 Dias BP 110 mm/Hg 80-140 St Francis Dr Smith

Doe J 13-May-95 Bypass minutes 80 min St Francis Dr Sleepwell

Doe J 12-May-95 Cholesterol 180 St Francis Dr Bloodbank

Operational Data Base: One Record Per Observation

Page 11: AAVLD Informatics Committee: Data Standardization in the

11

For Repositories – Need to Think in a Different Data Structure.

Instead of dedicating one data field (in a visit record) – per result as is the common model in clinical research

• Dedicate a record per result.• That structure is found in every lab, repository,

pharmacy system • You will never find a field for hemoglobin or

cholesterol – or for penicillin V • The record carries extra pieces of information

about each value as follows

Page 12: AAVLD Informatics Committee: Data Standardization in the

12

Limits and Issues Depending Upon How the Data is Represented

Things to be aware of

Page 13: AAVLD Informatics Committee: Data Standardization in the

13

Clinical Information Comes in Multiple Forms, Each with its own Issues

1. Quantitative – e.g. Maximum calf circumference, Serum calcium• Attend to units – and possible need to convert• Special forms ( 1:128, > 10, 1-5 , codes and mixed with

numeric's)2. Ordinal “measures” – e.g. Glasgow eye opening score

• Answers likely to be “fixed” text or localized codes 3. Nominal (football jerseys) – e.g. Blood culture results

• Same issues as 2.• But require small amounts of labor compared to direct

manual capture

Page 14: AAVLD Informatics Committee: Data Standardization in the

14

Narrative Text Can be Good

• The Good:Easy to record/capture and useCan be searched for text patternsSome success in finding specially-targeted with

simple NLP

• The Bad:Usually requires some human review of

retrieved recordsStill light years faster than chart review

Page 15: AAVLD Informatics Committee: Data Standardization in the

15

Document Images, Clinical Images and Tracings

• Fast access for human review

• Access to original data – esp. with tracings

• Human assisted measurements of biologic images

Page 16: AAVLD Informatics Committee: Data Standardization in the

16

By-Patient and Cross-Patient Access

• Clinical repositories usually justified for clinical care – so data is organized by patient for clinician review

• May lack efficient cross-patient access as needed for research Three kinds of problems:

They may not have the right index structures or computer power for searching

The may not have tools for non-programmer access per query

The data may be a mess inside – good enough for display to a human but not for automatic searching

Page 17: AAVLD Informatics Committee: Data Standardization in the

17

Repositories have Different Scopes • Local clinical data only• Local clinical data plus administrative data (Very

Useful)• Local data supplemented with “external data”

some of which may be “internal” Tumor registry (local – state) ACC data – Local – more Social Security death tape – NO INSTITUTION

SHOULD BE WITHOUT ONE Medicaid? Other

• Community wide repository (RHIOSs)

Page 18: AAVLD Informatics Committee: Data Standardization in the

18

Research Uses

• Find potential cases for studies (local)

• Review candidates for study eligibility before trying to enroll (even with no search capabilities)

• Obtain numbers and statistical characteristics of potential candidates for grants (local)

Page 19: AAVLD Informatics Committee: Data Standardization in the

19

Research Uses – More

• Estimate variance for sample size analysis

• Track outcomes – (labs – death) – longitudinal studies/Cost-benefit studies (local)

• Epidemiologic studies (esp. with community scope)

• Obtain tissue (through pathology reports)

• Link phenotype with genotype (if also collecting genetics)

Page 20: AAVLD Informatics Committee: Data Standardization in the

20

Problems with Today’s Research

Strategies

Page 21: AAVLD Informatics Committee: Data Standardization in the

21

Not Enough Research Data

• Clinicians are faced with zillions of decisions• Research helps only some of them

� Preventive decisions – but even for some of these (pneumonia vaccine) data are soft

Many cardiovascular interventions Some anticoagulation interventions

• Little help with special circumstances – age, co-morbidity

• Almost no data for decisions about diagnostic testing, surgery, use of devices

• Almost no help regarding cost benefits

Page 22: AAVLD Informatics Committee: Data Standardization in the

22

Deeper Problems

• Sample size requirements for trials become difficult/impossible when Event rates are small When difference between treatment and “control”

are small: often the case is comparison of new with best existing

treatment

We want to quantify the amount of benefit accurately for cost benefit analysis A

Page 23: AAVLD Informatics Committee: Data Standardization in the

23

Deeper Problems - More

• A study with 10% event rate and 25% difference (big difference) can require enrollment of 10,000 patients.

• To be 95% sure of finding one case of finding with event rate of 1/25,000 need to observe 63,000 cases (e.g. rhabdomyolysis)

• Trials can’t cover the entire water front

Page 24: AAVLD Informatics Committee: Data Standardization in the

24

How to Get More for Less• Collect less on greater number of patients • Use Repositories

• To find patients for trials• For retrospective analysis of rare events• For post-marketing drug toxicities• To supplement data collection in traditional

clinical trials• For gathering outcomes and follow up in

longitudinal studies and large simple trials (Community repositories)

• To find tissue (paraffin) for study

Page 25: AAVLD Informatics Committee: Data Standardization in the

25

Repository Examples

• Partners analytical database (Murphy SN)Considering labs alone – more than 125

different labs interfacedUses LOINC as lingua franca for gluing

different results together At LEAST (old data):

2.5 million patients with clinical data 700 million clinical facts 750 active researchers 7000 queries/year

Page 26: AAVLD Informatics Committee: Data Standardization in the

26

More Examples

• The VA – mapping all of their lab tests to LOINC – so data can be pooled across hospitals.

• CRN – collaboration of 10 large “HMOs” for cancer research (Puget Sound, Kaiser, etc.) lab, radiology, drugs available from the collaborators (Wagner, et. al)

Page 27: AAVLD Informatics Committee: Data Standardization in the

27

Community-Based Repositories

Memphis

North West Indiana

British Columbia

Pediatric hospitals in Ontario

North Jutland, Denmark

Utrecht, Netherlands

Central Indiana (Indianapolis) (INPC)

Page 28: AAVLD Informatics Committee: Data Standardization in the

28

INPC – What Is It?

• Centralized (federated) clinical repository for central Indiana

• Data delivered from all major Indianapolis hospital systems as HL7 ver. 2.x

• Treat patients from each institute as separate institution

• Funded by NLM (INPC) and NCI (SPIN)• Open Source software

Page 29: AAVLD Informatics Committee: Data Standardization in the

29

What is it For?

• Clinical care� Eligible providers can access clinical data from

all sources in one view when patient is seeking their care

• Public Health � Bio-surveillance� Reportable disease reporting

• Quality• Research (Today’s subject)

Page 30: AAVLD Informatics Committee: Data Standardization in the

30

Who Contributes Data?• Hospitals

� Five Indianapolis Hospital Systems (total of 15 separate hospitals)

� Stand alone labs� Payers

� Medicaid (whole state)• Encounter ICD & CPT + meds

• 150 M encounters 75 M prescriptions

� WellPoint (largest healthcare company in US – more patients than Medicare)

Page 31: AAVLD Informatics Committee: Data Standardization in the

31

Who Contributes Data? – More

• Tumor registry (De-identified research only – whole state) – 550 K cases (another “institution”)

• Death tapes (Important)� Indiana State Public Health Department� Social Security ( 80 million )

Page 32: AAVLD Informatics Committee: Data Standardization in the

32

INPC Storage Strategy

• Central Community database resource and

management of mapping, etc.

• Standardized data structure – all use same

software and observation codes.

• Data for each organization in its own physical files

• Combine on-the-fly when needed

• Patient linking needed – because no national ID

Page 33: AAVLD Informatics Committee: Data Standardization in the

33

Page 34: AAVLD Informatics Committee: Data Standardization in the

34

All Hospitals Contribute – At Least

• Lab results

• Cardiology reports

• Tumor registry data

• In-patient medication orders (committed)

• TEXT IS GOOD� Discharge summaries/admission summaries

Operative notes� Radiology reports� Pathology reports – gets you to existing tissue

• Some Contribute All

Page 35: AAVLD Informatics Committee: Data Standardization in the

35

2006 INPC Data Flows and Content

• Flows� More than 150 HL7 message streams� More than 100 million separate HL7 messages per

year (380 million OBX’s)� Add about 80 million results per year� HL7 ver. 2 works!!!!

• Content� 6 million distinct patient registration records ( 3 M)� 850 million discrete results� 50 million radiology images � 17 million narrative reports

Page 36: AAVLD Informatics Committee: Data Standardization in the

36

How does the Data Flow from Source to RHIO Repository?

• HL7 messages delivers most of the clinical data.

• DICOM for radiology images.

• NCPDP for outpatient pharmacy.

• LOINC – provides standard codes that define the content of each delivered result.

http://www.regenstrief.org/loinc

Page 37: AAVLD Informatics Committee: Data Standardization in the

37

Page 38: AAVLD Informatics Committee: Data Standardization in the

38

Radiology Images - Thumbnail

Page 39: AAVLD Informatics Committee: Data Standardization in the

39

BIG

Page 40: AAVLD Informatics Committee: Data Standardization in the

40

BIGGER

Page 41: AAVLD Informatics Committee: Data Standardization in the

41

BIGGEST 2800 x 2000

Page 42: AAVLD Informatics Committee: Data Standardization in the

42

EKG Discrete Variables

Page 43: AAVLD Informatics Committee: Data Standardization in the

43

EKG Tracings

Page 44: AAVLD Informatics Committee: Data Standardization in the

44

Flow Sheet for Blood Count

Page 45: AAVLD Informatics Committee: Data Standardization in the

45

Page 46: AAVLD Informatics Committee: Data Standardization in the

46

Orders

Page 47: AAVLD Informatics Committee: Data Standardization in the

47

Report Delivery to Office Practices

• 1300+ practices (3800 MDs) at present

• 90% of the active care providers in 9 county region

• Many opportunities to practice access for

Page 48: AAVLD Informatics Committee: Data Standardization in the

48

Repository Research Uses

Page 49: AAVLD Informatics Committee: Data Standardization in the

49

INPC Use for Research

• 100’s of queries for grants/year, e.g. to estimate # of cases available for study. To find cases.

• Pull supplemental data for many clinical trials

• Used in 80% of human subjects studies at some point in study

• Remind MDs of studies underway (recruitment)

• Database studies – the greatest:� Erythromycin and pyloric stenosis 1

1

1 Mahon BE, Rosenman MB, Kleiman MB. Maternal and infant use of erythromycin and other macrolide antibiotics as risk factors for infantile hypertrophic pyloric stenosis. J Pediatr. 2001 Sep;139(3):380-4.

Page 50: AAVLD Informatics Committee: Data Standardization in the

50

Tissue Access

• SPIN project � NCI-funded Collaboration among

IU/Regenstrief Harvard, University of Pittsburgh, UCLA

� Use query to find clinical cases of interest. Pathology reports provide the link to tissue – paraffin block – 4 M in Indianapolis

Page 51: AAVLD Informatics Committee: Data Standardization in the

51

Special Query Capabilities

• Access to more than 10,000 distinct variables

• Built in de-identification processes� Dates truncated to year� Forbidden fields removed� Text reports are scrubbed (Examples)

• Build cohort twin databases then statistical analyses.

Page 52: AAVLD Informatics Committee: Data Standardization in the

52

Special Query Capabilities – More

• Each kind of text report is just another variable – Google-like searches on text, more traditional criteria for numeric and coded variables search

• Tie in to R-(RECCOMENDED) and pre-packaged statistical routine

• User can do statistical analyses without ever touching any data

Page 53: AAVLD Informatics Committee: Data Standardization in the

53

SPIN Build Data Set Query

Page 54: AAVLD Informatics Committee: Data Standardization in the

54

SPIN Look at Data Set

Page 55: AAVLD Informatics Committee: Data Standardization in the

55

SPIN Look at Individual Scrubbed Report

Page 56: AAVLD Informatics Committee: Data Standardization in the

57

How do we Glue Data Together?

• Use linking algorithms to tie patients – based on registration data

• Use LOINC codes and mapping tools to tie equivalent variables together

Page 57: AAVLD Informatics Committee: Data Standardization in the

58

How do we Get There?

• Glue data from many sources together

• First from your institution

• Then other related data bases (hospital is full of them from tumor registry to heart attack database)

• Rx.Hub – 60% of the nations prescriptions

• Don’t forget Death tapes

• Push for community data melds – they could revolutionize clinical …

Page 58: AAVLD Informatics Committee: Data Standardization in the

59

How do we Get There? – 2

• Force connections between clinical trials systems and institutional systems

• The current state makes no sense

• Demand HL7 bidirectional registration and resulting transmission

• Push for use of HL7 clinical trial segments in orders and reporting

Page 59: AAVLD Informatics Committee: Data Standardization in the

60

How do we Get There? – 3

• If combining independent sources� Need linking routines (NIH should make good

tools publicly available)

• Combine for clinical use – de-identify for research use (limited data sets)� Make well-tested de-identification tools

publicly available

Page 60: AAVLD Informatics Committee: Data Standardization in the

61

How do we Get There? – 4

• Develop national catalogues for variables and questionnaires. Contribute new variables to the catalogue when existing ones really won’t do.

• Use LOINC – as the catalogue – try it, you’ll like it

Page 61: AAVLD Informatics Committee: Data Standardization in the

62

LOINC and RELMA Web Site – No Cost Downloads

� Type in LOINC at Google� Pig

Page 62: AAVLD Informatics Committee: Data Standardization in the

63

Challenges Exist

• Each Study (and phase) needs ID => Institutional study database

• Ordering systems need option for adding trial ID and phase to the order

• HL7 has segments defined for these – not hard, just need to be articulated

Page 63: AAVLD Informatics Committee: Data Standardization in the

64

Challenges Exist – 2

• Catch 22's – e.g., recruitment

• Defeats the efficiencies intrinsic to repositories

• Need more rational rules

Page 64: AAVLD Informatics Committee: Data Standardization in the

65

Challenges Exist – 3

• Managing (and retrieving consents)

• Solvable with scanning and proper workflow

Page 65: AAVLD Informatics Committee: Data Standardization in the

66

Medicare & Medicaid – Miracles

• Could follow-up of deaths via SS death tapes (here now)

• Find outcome events and (Medicare patients) in Medicare database

• Track medication and intervention use (Medicare patients) Medicare database

• Similar opportunities with Medicaid databases

Page 66: AAVLD Informatics Committee: Data Standardization in the

67

Research Will Still be Hard

• Clinical systems will not carry all data of interest

• Repositories are not magic.

• But we could collect less if we used the available clinical data where it sufficed and focused the question on strong outcomes

Page 67: AAVLD Informatics Committee: Data Standardization in the

68

ASIMO at CES 2007

• htthp://www.youtube.com/watch?v=UOWYIjbKDcc