Upload
jared56
View
180
Download
5
Embed Size (px)
Citation preview
The Case for Clinical Repositories as a Data Source for Research
Clem McDonald, MDDirector
Lister Hill National Center for Biomedical Communications
National Library of Medicine
National Advisory Research Resources Council (NARRC) MeetingJanuary 18, 2007
2
Clinical Repositories
What do they do?
What could they do?
3
Why Clinical Repositories
• Many sources of electronic data in an institution• Labs, radiology, Pharmacy, MD orders, EKGs,
Dictated reports, Radiology images, etc, etc, etc• Most of these sources can deliver this data via HL7
messages to another computer
• Repository is a database that provides a unified and simple access to all of this data in a unified view.
4
What Data is Available Within the Institution
• Lab data (almost always electronic)• Medication orders in patients • Radiology reports (text)• Pathology reports (text)• Dictation (discharge summary)• EKGs• Cardiac echoes• Endoscopy • Obstetrical Ultrasound• Nursing Observation
5
More Data – “Outside” the Institution
• Event Data – Coded diagnoses and procedures
• Tumor registries – whole country
• Cardiology data bases (ACC, ATS, etc.) – whole country
• Federal ESRD -base
• Out patient medications – From pharmacy benefit managers. Rx.Hub
• More
6
Potential Availability Still More
• Medicaid – procedures, diagnoses and drug use
• Medicare – Diagnoses, procedures and (now) medication use
• Lots of special federal collection instruments – Nursing home, disability, Medicare introduction, etc.
7
Why codes for observations so important
• The observation is not defined by the field as in typical in research and policy data bases
8
Flat Data Set(Analytic Conceptualization)
Pat ID Name surgery date
Hgb DBP # of BPU
Bypass Minute
Cholest
1234-5 Doe Jane 12May95 13 95 3 80 180
9999-3 Jones T 1Aug95 12.5 88 2 90 230
8888-3 Doe Sam 4June95 16 78 0 80 205
9
Why observation codes important
• The observation ID defined by a “pointer” to a master table – as follows
10
Stacked Data Set Application Conceptualization
Pt ID Relevant Date
Observation ID Value Units Normal Rang
Place Observer
Doe J 12-May-95 Hemoglobin 13 mg/dl 12.5-15 St Francis Dr Smith
Doe J 12-May-95 Hemoglobin 11.5 mg/dl 12.5-15 St Francis Dr Smith
Doe J 12-May-95 Dias BP 95 mm/Hg 80-140 St Francis Dr Smith
Doe J 12-May-95 Dias BP 110 mm/Hg 80-140 St Francis Dr Smith
Doe J 13-May-95 Bypass minutes 80 min St Francis Dr Sleepwell
Doe J 12-May-95 Cholesterol 180 St Francis Dr Bloodbank
Operational Data Base: One Record Per Observation
11
For Repositories – Need to Think in a Different Data Structure.
Instead of dedicating one data field (in a visit record) – per result as is the common model in clinical research
• Dedicate a record per result.• That structure is found in every lab, repository,
pharmacy system • You will never find a field for hemoglobin or
cholesterol – or for penicillin V • The record carries extra pieces of information
about each value as follows
12
Limits and Issues Depending Upon How the Data is Represented
Things to be aware of
13
Clinical Information Comes in Multiple Forms, Each with its own Issues
1. Quantitative – e.g. Maximum calf circumference, Serum calcium• Attend to units – and possible need to convert• Special forms ( 1:128, > 10, 1-5 , codes and mixed with
numeric's)2. Ordinal “measures” – e.g. Glasgow eye opening score
• Answers likely to be “fixed” text or localized codes 3. Nominal (football jerseys) – e.g. Blood culture results
• Same issues as 2.• But require small amounts of labor compared to direct
manual capture
14
Narrative Text Can be Good
• The Good:Easy to record/capture and useCan be searched for text patternsSome success in finding specially-targeted with
simple NLP
• The Bad:Usually requires some human review of
retrieved recordsStill light years faster than chart review
15
Document Images, Clinical Images and Tracings
• Fast access for human review
• Access to original data – esp. with tracings
• Human assisted measurements of biologic images
16
By-Patient and Cross-Patient Access
• Clinical repositories usually justified for clinical care – so data is organized by patient for clinician review
• May lack efficient cross-patient access as needed for research Three kinds of problems:
They may not have the right index structures or computer power for searching
The may not have tools for non-programmer access per query
The data may be a mess inside – good enough for display to a human but not for automatic searching
17
Repositories have Different Scopes • Local clinical data only• Local clinical data plus administrative data (Very
Useful)• Local data supplemented with “external data”
some of which may be “internal” Tumor registry (local – state) ACC data – Local – more Social Security death tape – NO INSTITUTION
SHOULD BE WITHOUT ONE Medicaid? Other
• Community wide repository (RHIOSs)
18
Research Uses
• Find potential cases for studies (local)
• Review candidates for study eligibility before trying to enroll (even with no search capabilities)
• Obtain numbers and statistical characteristics of potential candidates for grants (local)
19
Research Uses – More
• Estimate variance for sample size analysis
• Track outcomes – (labs – death) – longitudinal studies/Cost-benefit studies (local)
• Epidemiologic studies (esp. with community scope)
• Obtain tissue (through pathology reports)
• Link phenotype with genotype (if also collecting genetics)
20
Problems with Today’s Research
Strategies
21
Not Enough Research Data
• Clinicians are faced with zillions of decisions• Research helps only some of them
� Preventive decisions – but even for some of these (pneumonia vaccine) data are soft
Many cardiovascular interventions Some anticoagulation interventions
• Little help with special circumstances – age, co-morbidity
• Almost no data for decisions about diagnostic testing, surgery, use of devices
• Almost no help regarding cost benefits
22
Deeper Problems
• Sample size requirements for trials become difficult/impossible when Event rates are small When difference between treatment and “control”
are small: often the case is comparison of new with best existing
treatment
We want to quantify the amount of benefit accurately for cost benefit analysis A
23
Deeper Problems - More
• A study with 10% event rate and 25% difference (big difference) can require enrollment of 10,000 patients.
• To be 95% sure of finding one case of finding with event rate of 1/25,000 need to observe 63,000 cases (e.g. rhabdomyolysis)
• Trials can’t cover the entire water front
24
How to Get More for Less• Collect less on greater number of patients • Use Repositories
• To find patients for trials• For retrospective analysis of rare events• For post-marketing drug toxicities• To supplement data collection in traditional
clinical trials• For gathering outcomes and follow up in
longitudinal studies and large simple trials (Community repositories)
• To find tissue (paraffin) for study
25
Repository Examples
• Partners analytical database (Murphy SN)Considering labs alone – more than 125
different labs interfacedUses LOINC as lingua franca for gluing
different results together At LEAST (old data):
2.5 million patients with clinical data 700 million clinical facts 750 active researchers 7000 queries/year
26
More Examples
• The VA – mapping all of their lab tests to LOINC – so data can be pooled across hospitals.
• CRN – collaboration of 10 large “HMOs” for cancer research (Puget Sound, Kaiser, etc.) lab, radiology, drugs available from the collaborators (Wagner, et. al)
27
Community-Based Repositories
Memphis
North West Indiana
British Columbia
Pediatric hospitals in Ontario
North Jutland, Denmark
Utrecht, Netherlands
Central Indiana (Indianapolis) (INPC)
28
INPC – What Is It?
• Centralized (federated) clinical repository for central Indiana
• Data delivered from all major Indianapolis hospital systems as HL7 ver. 2.x
• Treat patients from each institute as separate institution
• Funded by NLM (INPC) and NCI (SPIN)• Open Source software
29
What is it For?
• Clinical care� Eligible providers can access clinical data from
all sources in one view when patient is seeking their care
• Public Health � Bio-surveillance� Reportable disease reporting
• Quality• Research (Today’s subject)
30
Who Contributes Data?• Hospitals
� Five Indianapolis Hospital Systems (total of 15 separate hospitals)
� Stand alone labs� Payers
� Medicaid (whole state)• Encounter ICD & CPT + meds
• 150 M encounters 75 M prescriptions
� WellPoint (largest healthcare company in US – more patients than Medicare)
31
Who Contributes Data? – More
• Tumor registry (De-identified research only – whole state) – 550 K cases (another “institution”)
• Death tapes (Important)� Indiana State Public Health Department� Social Security ( 80 million )
32
INPC Storage Strategy
• Central Community database resource and
management of mapping, etc.
• Standardized data structure – all use same
software and observation codes.
• Data for each organization in its own physical files
• Combine on-the-fly when needed
• Patient linking needed – because no national ID
33
34
All Hospitals Contribute – At Least
• Lab results
• Cardiology reports
• Tumor registry data
• In-patient medication orders (committed)
• TEXT IS GOOD� Discharge summaries/admission summaries
Operative notes� Radiology reports� Pathology reports – gets you to existing tissue
• Some Contribute All
35
2006 INPC Data Flows and Content
• Flows� More than 150 HL7 message streams� More than 100 million separate HL7 messages per
year (380 million OBX’s)� Add about 80 million results per year� HL7 ver. 2 works!!!!
• Content� 6 million distinct patient registration records ( 3 M)� 850 million discrete results� 50 million radiology images � 17 million narrative reports
36
How does the Data Flow from Source to RHIO Repository?
• HL7 messages delivers most of the clinical data.
• DICOM for radiology images.
• NCPDP for outpatient pharmacy.
• LOINC – provides standard codes that define the content of each delivered result.
http://www.regenstrief.org/loinc
37
38
Radiology Images - Thumbnail
39
BIG
40
BIGGER
41
BIGGEST 2800 x 2000
42
EKG Discrete Variables
43
EKG Tracings
44
Flow Sheet for Blood Count
45
46
Orders
47
Report Delivery to Office Practices
• 1300+ practices (3800 MDs) at present
• 90% of the active care providers in 9 county region
• Many opportunities to practice access for
48
Repository Research Uses
49
INPC Use for Research
• 100’s of queries for grants/year, e.g. to estimate # of cases available for study. To find cases.
• Pull supplemental data for many clinical trials
• Used in 80% of human subjects studies at some point in study
• Remind MDs of studies underway (recruitment)
• Database studies – the greatest:� Erythromycin and pyloric stenosis 1
1
1 Mahon BE, Rosenman MB, Kleiman MB. Maternal and infant use of erythromycin and other macrolide antibiotics as risk factors for infantile hypertrophic pyloric stenosis. J Pediatr. 2001 Sep;139(3):380-4.
50
Tissue Access
• SPIN project � NCI-funded Collaboration among
IU/Regenstrief Harvard, University of Pittsburgh, UCLA
� Use query to find clinical cases of interest. Pathology reports provide the link to tissue – paraffin block – 4 M in Indianapolis
51
Special Query Capabilities
• Access to more than 10,000 distinct variables
• Built in de-identification processes� Dates truncated to year� Forbidden fields removed� Text reports are scrubbed (Examples)
• Build cohort twin databases then statistical analyses.
52
Special Query Capabilities – More
• Each kind of text report is just another variable – Google-like searches on text, more traditional criteria for numeric and coded variables search
• Tie in to R-(RECCOMENDED) and pre-packaged statistical routine
• User can do statistical analyses without ever touching any data
53
SPIN Build Data Set Query
54
SPIN Look at Data Set
55
SPIN Look at Individual Scrubbed Report
57
How do we Glue Data Together?
• Use linking algorithms to tie patients – based on registration data
• Use LOINC codes and mapping tools to tie equivalent variables together
58
How do we Get There?
• Glue data from many sources together
• First from your institution
• Then other related data bases (hospital is full of them from tumor registry to heart attack database)
• Rx.Hub – 60% of the nations prescriptions
• Don’t forget Death tapes
• Push for community data melds – they could revolutionize clinical …
59
How do we Get There? – 2
• Force connections between clinical trials systems and institutional systems
• The current state makes no sense
• Demand HL7 bidirectional registration and resulting transmission
• Push for use of HL7 clinical trial segments in orders and reporting
60
How do we Get There? – 3
• If combining independent sources� Need linking routines (NIH should make good
tools publicly available)
• Combine for clinical use – de-identify for research use (limited data sets)� Make well-tested de-identification tools
publicly available
61
How do we Get There? – 4
• Develop national catalogues for variables and questionnaires. Contribute new variables to the catalogue when existing ones really won’t do.
• Use LOINC – as the catalogue – try it, you’ll like it
62
LOINC and RELMA Web Site – No Cost Downloads
� Type in LOINC at Google� Pig
63
Challenges Exist
• Each Study (and phase) needs ID => Institutional study database
• Ordering systems need option for adding trial ID and phase to the order
• HL7 has segments defined for these – not hard, just need to be articulated
64
Challenges Exist – 2
• Catch 22's – e.g., recruitment
• Defeats the efficiencies intrinsic to repositories
• Need more rational rules
65
Challenges Exist – 3
• Managing (and retrieving consents)
• Solvable with scanning and proper workflow
66
Medicare & Medicaid – Miracles
• Could follow-up of deaths via SS death tapes (here now)
• Find outcome events and (Medicare patients) in Medicare database
• Track medication and intervention use (Medicare patients) Medicare database
• Similar opportunities with Medicaid databases
67
Research Will Still be Hard
• Clinical systems will not carry all data of interest
• Repositories are not magic.
• But we could collect less if we used the available clinical data where it sufficed and focused the question on strong outcomes
68
ASIMO at CES 2007
• htthp://www.youtube.com/watch?v=UOWYIjbKDcc