Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
MY CAB RIDE ON THE WAY IN…
2
MY CHARGE
1. What is the state of the art approaches to linking data across multiple sources?
2. How will these approaches contribute to increasing opportunities?
3. What are the short and long term challenges?
4. What are the top 2-3 things that need to happen?
5. Who are the relevant stakeholders that must be engaged?
3
MY PLAN
1. A bit of reflection and philosophy .. Where have we been?
2. Link between “linking” or connecting healthcare and unlocking the value proposition from HIT investments
3. Solution description
4. Deep work not just Deep thinking – this is real and works now
5. Suggestions for areas where we need help
4
5
LINKING DATA ACROSS TIME AND SPACE(CONNECTING HEALTHCARE)
All Paper � Comprehensive Clinicians Desktop
HIT EVOLUTION – FOCUS ON DIGITIZING PAPER
“Epic”
� Integrated, comprehensive on-line medical record
� Full HIS/administrative integration
� Decision Support and evidence based guidelines
� Web, wireless, mobile –anytime anywhere.
� Innovation in “filing systems”
� Lab and Pharmacy departments automated
� No systems for clinicians
� HIS Systems (lab, pharma, ADT) & silo solutions– OB
� Limited enterprise workflow
� No IT integration
Paper Medical RecordsBack Office Automation
� Fragmented, clinical area specific solutions : ICU, ED, OB, OR…
� Minimal integration
� Legacy IT architectures
� Failed execution on integration and site configuration
Best of Breed Solutions
RISLab
ADT
Pharmacy
Paper Med Recs
PACS
ICU OB
ED
Transcription
OR
HISPACS
OB ICU OR ED
1980s
1990s
Current
Generation
Right Now
CHALLENGES – REAL AND PERCEIVED ARE MANY: RECENT BLOG ENTRY
7
Crap process + Technology = Fast Crap
“I like to describe technology as the great magnifier. The challenge is that it will magnify both the good and bad elements of your processes. Fix the process before you apply the technology.” (From: 101 EMR and EHR Tips)
8
IT - “THE GREAT MAGNIFIER”
Facility Focus : Just Get My Doctors and Nurses on-line
Traditional HIT Focus : EMR, PMIS, Departmental, HIT etc.
Pace of Adoption Quickening
Fragmented Healthcare : Geographic and Sub-specialization
Trends Continue
Islands of Automation
RISE OF THE DOTS….DIGITIZATION OF HEALTHCARE LAST 30 YEARS
DUBIOUS INHERENT ROI… “FREE MARKET” STATE PRIOR TO HITECH
10
• 20% penetration = 80% failure rate
• $39 Billion incentive (“bribe”) from the feds to adopt technology –“meaningful use”
• Why?
11
OFFICE AUTOMATION IN 1980’sPRODUCITVITY GAP AFTER $1T SPEND?
CLINICAL CONTEXT – A REMINDER• Chronic disease has replaced acute disease as the
dominant health problem
• Chronic disease is now the principal cause of disability & use of health services
• Chronic disease consumes 78% of health expenditures
Holman H. JAMA Vol 292, No.9, Sept 1, 2004
SO.. WHY SHOULD WE CARE?ACUTE vs. CHRONIC
• Acute = episodic
• commonly a cure
• inexperienced, passive patient
• physician administers treatment
• Chronic = continuous
• rarely a cure
• patient lives indefinitely with the disease, symptoms & consequences
• persistent treatment
• patient has integral role in the treatment process
• Multiple caregivers involved in process over long periods of time
Holman H. JAMA Vol 292, No.9, Sept 1, 2004
WHY EXCHANGE OR LINKING?
• The nature of care changes over time
• Must be managed over time as disease evolves with shifting severity, pace & treatments
• Good management is an unfolding process, best provided by a multi-disciplinary team of professionals
• Continuity & coordination of care are essential
• Information About the Patient Needs to Move Thru the “System” as fast as the patient does.
PRACTICE OF CHRONIC CARE MEDICINE REQUIRES A DIFFERENT APPROACH
• The nature of care changes over time
• Must be managed over time as disease evolves with shifting severity, pace & treatments
• Good management is an unfolding process, best provided by a multi-disciplinary team of professionals
• Continuity & coordination of care are essential
• Information About the Patient Needs to Move Thru the “System” as fast as the patient does.
16
THE ULTIMATE GOAL: QUALITY …REQUIRES INTEROPERABLE DATA
Have access to longitudinal
patient history before clinical decisions are
made
Improved Coordination of Care
Reduced
Costs
Reduced
Fragmentation
Improved
Outcomes
MISSION
CONNECTED HEALTH SYSTEMS…
The goal is not to move from “paper silos” to “electronic silos”
The goal is an electronic health system that supports and requires the movement of interoperable health information supporting:
•Continuity of Care
•Population Needs (pandemics and other disasters)
•Bench to Bedside Research
•Disability Determination
In a January 9, 2009 speech at George Mason University
"To improve the quality of our healthcare while lowering
its cost, we will make the immediate investments necessary to ensure that,
within five years, all of America’s medical records are computerized.”
THE “POST-EMR” ERA IS UPON US
18
• Just like “post-pc”
• The EMR is just a means of digitizing information – by ITSELF not sufficient to accruing transformational systemic value
IT’S NOT ABOUT THE DATA/HIT ITSELF: IT IS WHAT WE DO WITH IT AND HOW
THE “POST-EMR” ERA IS UPON US
• Visualization
• Episode Grouping
• Transformation/Translation
• White Space Mgmt – not about digitizing “clinical encounters” – TOC/Care Coordination
• Extrapolation
• Actionable Information and Analytics – Prospective; not just retrospective
vSPR – “Expedia for Healthcare”
Problem Summary List
Care Timeline
All sites of care
Today’s Problem Summary List
On Longitudinal Basis Complex Patients Can be Too Difficult to “Understand” Quickly
Care Timeline
Procedure / Med Hx
Currently Distinct From the Problems
Medical Episode Groups / Disease Clusters
ASTHMA MAINTENANCE
ASTHMA W/ COMPLICATIONH
H
H
TRAUMA – FRACTURE H
Disease Clusters to compress the “row noise”
Crisp – Cogent Display
Can see Forest Through The Trees Now
Can Compress Display overall
Step 1: Demonstrates the ability to qualify initiatives…. i.e. Readmits are 1% of patients and 11% of costs.
Step 2: Validating and Calibrating a predictive model. Demonstrates our ability to implement an industry standard model (LACE) and calibrate the implementation of the model in a care coordination initiative (i.e. select a score that translates a manageable number of cases for the case management team).
Step 3: A predictive model in production at CHNw: Operationalize the intervention
Real time reporting is integrated with the full patient record… case managers have zero hunting/gathering.
Step 4: The effectiveness of a proposed care coordination intervention can be tracked over time… so that iterative improvements can be made.
NEED FOR LINKING A PICTORIAL• Profound and Basic
Need• Subject Duplicity
• Longitudinal Follow up• Acute Events IN CDa
• Prescription Hx in EMR1
• ER in CD1
• Genome in CD2
• Cant’ really first anonymize and then link
• Issue• Privacy Concern (Not TPO)
• Business Issues
Claims Dataset A
EMR1
EMR1
Claims Dataset B
WARNING : ACRONYMS AHEAD
RECORD LINKING BASICS
• Methodology– Typically pair wise comparisons
• Vikas Kheterpal vs. Vick Khetarpal• Vick Khetarpal vs. Victor Kheterpal
– Probablistic versus deterministic
• Issues in General– False positives aka specificity– False negatives aka sensitivity
• Underlying Data Characteristics– Dirtiness– Transliteration– Uniformity of datasets
• Name Issues (alias, ethnicity, marriage)
HEALTHCARE SPECIFIC ISSUES
• Intolerance for false positives
• Lack of uniformity of identifiers
• Multitude of data sources– More than pair-wise links
– N-way links within a network
• Core identifiers/Minimum versus tie-breakers– Core: Recipient#, SSN, Name, Gender, DOB
– Supplementary: address, phone, ec.
• Intentionally dirty data
LINKAGE METHODOLOGIES
• Rapid Implementation
• Simple Calculation
• Relies on Accurate Data -typically
• May not functional well with other data sets
• Complex Implementation
• Computationally intensive
• More Forgiving of Data Errors
• Algorithm is customized to data being linked
Deterministic Probabilistic/Statistical
NAME SCALING
CryptoRLS™ MPI
• Combination of deterministic and probabilistic
• Core Prob Algorithm: proven on largest data set in the world– US Census Bureau– SAMSHA Fed Agency
• Built from the ground up for specificity– Alias names– Transliteration– Address standardization– Consistency checker
World Class Record Linking : High Specificity and Sensitivity Proven
Definite
Match Linker
Record
Store
Blocked
Records
Possible
Match Linker
Consistency
Checker
Possible
Matches
Definite
Matches
Record To
Link
Possible Match Linker
TotalScore =
SSN sim * SSN wt +
FN sim * FN wt +
LN sim * LN wt +
DOB sim* DOB wt +
Gender * Gender wt +
Race * Race wt +
ZipCode * ZipCode wt +
…
IF
TotalScore > Threshold
THEN
Match
Definite Match Linker
IF
SSN match AND
DOB match AND
Gender match AND
( FirstName 80% sim OR
LastName 90% sim )
OR
FirstName 80% sim AND
LastName 90% sim AND
DOB match AND
Gender match AND
( ZipCode match OR
Race match )
OR
…
THEN
Match
Blocking Query
IF
SSN match
OR
FirstName match AND
LastName match
OR
DOB match AND
Gender match
THEN
Block
The Problem: Centralized MPICritical Weak Link for Privacy
1. Data aggregation increases value to attackers – one stop … all citizens
2. Large number of entities with legitimate need to access the RLS… increases vulnerability
3. “discoverability” of information by government agencies
4. Threat from within
5. Fostering “trust” amongst competing provider entities.
“BLINDFOLDED RECORD LINKING”
1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking without even KNOWING the contents
2. No risk in managing entire regional population
3. No clinical data centralization
4. Protects from• Internal threats
• Disgruntled employees
• External hacks
• Inadvertent loss (theft, backup distribution)
5. Web Services provide a “catcher-pitcher” handoff
Use Cryptographic Techniques Like Hashing to Obfuscate The PHI.. “ The SHA hash functions are a set of cryptographic hash functions designed by the National Security Agency (NSA) and published by the NIST as a U.S. Federal Information Processing Standard. SHA stands for Secure Hash Algorithm.” (wikipedia)
“The four hash algorithms specified in this standard are called secure because, for a given algorithm, it is computationally infeasible 1) to find a message that corresponds to a given message digest, or 2) to find two different messages that produce the same message digest” NIST FIPS140-2 Standard Published 2003.
How To Perform Effective “linking” andHashing • Misspellings would generate completely different
“hashes” or fingerprints
• Different permutations depending on type of identifier– Names: Bigrams, Nicknames, NYSIS codes, etc
• Bigram: All subsets of the set of 2 consecutive characters in a string.
• Example: Pete -> {“pe”, “et”, “te”}, {“pe”, “et”}, {“pe”, “te”}, {“et”, “te”}, {“pe”}, {“et”}, {“te”}
– Numeric: Transpositions, Off-By-One, etc– Date: Month-day swap, Off-by-one, etc
• Permutations provide ability to partial-match identifiers even though we’re blinded.
Blinded Record Index Steps
Records at participating institutions with identifiers (First Name, Last Name, SSN, DOB, …)
1)Standardizer Pass (delimiters, junk values, spaces)
2)Create permutations of identifiers with associated similarity score (NYSIS, Soundex, transpositions, Nicknames, …)
3)One-way hash the permutations
4)Persist hashed permutations in the BRI
5)Find other records in the BRI with matching permutations.
6)Link to other records based on identifier similarities
7)Group linked records into Patients
Within Each Site Adapter
At the “Center”
Permutation Generation
• Different permutations depending on type of identifier– Names: Bigrams, Nicknames, NYSIS codes, etc
• Bigram: All subsets of the set of 2 consecutive characters in a string.
• Example: Pete -> {“pe”, “et”, “te”}, {“pe”, “et”}, {“pe”, “te”}, {“et”, “te”}, {“pe”}, {“et”}, {“te”}
– Numeric: Transpositions, Off-By-One, etc
– Date: Month-day swap, Off-by-one, etc
• Permutations provide ability to partial-match identifiers even though we’re blinded.
Permutation and Hashing
RecordFirstName
Pete
Last Name
Doe
DOB 1/2/03
SSN 123-45-6789
Permutation
Score
{“et”,”pe”,”te”} 1.0
{“et”,”pe”} 0.7
{“et”,”te”} 0.7
“peter” 0.8
…….. ……
Permutation
Score
123456789 1.0
123456798 0.8
123456788 0.9
12345678x 0.7
....... …..
Permute
Permutation Score
1/2/03 1.0
2/1/03 0.7
1/2/04 0.7
….. …..
Hash
Hash Score
0x0123ab49c.. 1.0
0xa84f04… 0.7
0x7fb885.. 0.7
0x530de8.. 0.8
…….. ……
Hash
Hash
Similarity Calculation
FNameHash
Score
0xbb23a4b9c.. 1.0
0xa84f04… 0.8
0xbe8785.. 0.7
0x530de8.. 0.7
…….. ……DOB Hash Score
0x0123ab49c .. 1.0
0xa74f04… 0.7
0x7fb885.. 0.7
0x530de8.. 0.8
…….. ……
SSN Hash Score
0x530de8 .. 1.0
0xafb885… 0.7
0xdea745.. 0.7
0x8.. 0.8
…….. ……
FNameHash
Score
0x5123ab49c.. 1.0
0x8af404… 0.7
0xfee885.. 0.7
0xa84f04 .. 0.8
…….. ……DOB Hash Score
0x0123ab49c .. 1.0
0xa74f04… 0.7
0x7fb885.. 0.7
0x530de8.. 0.8
…….. ……
SSN Hash Score
0x1f23ab49c.. 1.0
0xa84f04… 0.7
0xafb885.. 0.7
0x530de8 .. 0.8
…….. ……
Record A Record B
Identifier Score
First Name 0.64
Last Name 0.9
DOB 1.0
SSN 0.8
…….. ……
A��B Similarity
Matches
Highest Scoring Match
Lower Scoring Matches
Link Creation
Identifier Score
First Name 0.64
Last Name 0.9
DOB 1.0
SSN 0.8
…….. ……
A��B Similarity
Rule Based
Condition Outcome
FName>.6 + LName>.8 + SSN>.7 + DOB=1
DefiniteLink
SSN>.7 + DOB>.9 + LName>.8 PossibleLink
…….. ……
Result
Definite Link
A B
Consistency Checker Pass
• Typical RL approach: pair-wise links (even with advanced Bayesian probabilistic algorithms)– Only as good as core algorithm
– Weakest link creates issues
• Healthcare: n-way linkage– A1: Bithika S. Kheterpal, 11/8/68, F, SSN1
– A2: Bithika S. Malhotra, 11/8/68, F, SSN1
– A3: Bithika S. Malhotra, 11/8/68, F, missing SSN
– A1=A2; A2=A3 but A1<>A3
• Consistency Checker will promote the A1 link.
• Impact: HUGE improvement in sensitivity without sacrificing specificity
Production Use: Three Use Cases
47
• Infrastructure for robust statewide or regional HIE or care coordination initiatives
• Infrastructure for determining >30 day mortality rate (blinded death master file lookup)
• Infrastructure for linked multi-center trials where need very large n across multiple centers – while preserving privacy (no patient consent required)
THREE STATES – SIMILAR NEEDS
48
• 32,000 Square miles• 4.01 Million Population• NC and Georgia
bordering
• 24,231 sq miles• 1.9M population• Landlocked – Ohio, PA,
Kentucky, Virginia
• 52,423 sq miles• 4.45 M population• Mississippi, TN, Georgia,
Florida
1. Algorithm powers the statewide HIE infrastructure in three states
2. Connections to claims (medicaid/CHIP), public health (IZ, ELR), and provider clinical data (hospitals, clinics)
3. Gateway to federal partners (VA etc), and Interstate Partners
4. Production use for clinical use – certainly acceptable for population/secondary uses
How Do We Know It Works?
• Live in three states – dozens of hospitals for clinical care
• SCHIEx – 6.2M patient over 10 years from > 200 distinct facilities
• Weekly Statistical Sample Manually Reviewed
• Statistical Process compared with Clinical Process ; clinical RLS has higher specificity
• No reported false positive in 3 years of production use.
• Live in three states – dozens of hospitals for clinical care
• SCHIEx – 6.2M patient over 10 years from > 200 distinct facilities
• Weekly Statistical Sample Manually Reviewed
• Statistical Process compared with Clinical Process ; clinical RLS has higher specificity
• No reported false positive in 3 years of production use.
ANOTHER APPLICATION – DETERMINING MORTALITY RATE PAST 30 DAYS– How to manage 30+ day mortality information for “your” clinical
intervention?
• Individual contact – very expensive operationally and dubious results
• Other approaches like death database lookups can be more efficient
– National Databases
• SSA publishes the Death Master File
• NTIS distributes a variety of methods of looking up information
– How to subscribe and use this lookup without disclosing PHI
• Most of patients you will lookup will still be alive (hopefully ☺ )
• How to do it minimal IT headache
50
DMF Background
51
Two Basic Approaches For Lookup
• Lookup using various websites– Individual lookup
– Submit files
– Web based queries
– All suffer from 1 basic issue
• You must share your PHI with an external entity to perform the lookup
• Download the file and build a tool internally– No disclosure of your PHI to external entities
– Issue : IT effort + approx $5K in master file license fees annually
52
Privacy Protecting Lookup
53
• Death database lookup– Application to convert PHI to one-way hashed code
– Updated to use SHA-256 hashing algorithm
– Pings central server for lookup
– Match or no-match + date of death returned
Self Service Tool To Invoke Lookup
54
Sample Text File Within Your Facility
55
Sample Results File
56
SIMPLE TO ADOPT INTO RESEARCH PROTOCOL
• Get URL from WEBSITE
• Download Hashing Widget from web site
• Prepare Text File
• Submit File
• Get Answer Back
57
Production Use: Three Use Cases
58
• Infrastructure for robust statewide or regional HIE or care coordination initiatives
• Infrastructure for determining >30 day mortality rate (blinded death master file lookup)
• Infrastructure for linked multi-center trials where need very large n across multiple centers – while preserving privacy (no patient consent required)– MPOG: Multicenter Perioperative Outcomes Group
http://mpog.med.umich.edu/
– International consortium of leading health centers
– Supported by the Anesthesia Quality Institute
Ex. PERIOPERATIVE PERIOD
• Physically intrusive intervention
• Risky and expensive
• Very difficult to blind study individuals
• Challenging to ethically randomize – Difficult airway, Hypotension, Hypertension
• “Random” clinical decisions rampant– Wide variation in practice because of few guidelines
• Challenging to recruit “priority populations”– Pediatrics– Emergency surgery w/ non-optimized patient– Racial & Ethnic Minorities
Like most veterinary students, Doreen breezes through Chapter 9.
RCTs: THE GOLD STANDARD…NOT FOR EVERYTHING
POST RELEASE CONTROVERSIES
SOMETIMES NOT SO GOLDEN
• Controlled trial is not routine clinical practice
• Specific, small study extrapolated to population at large
$$$ / patient �small study
Infrequent events � large study
Perioperative Research
Small RCT
Broad Use
Small RCT
Broad Use
Large Retrospective ReviewSmall RCT
Broad Use
Large Retrospective Review
Large RCT
Small RCT
Broad Use
Large Retrospective Review
Large RCT
RESEARCH CIRCLE OF LIFE
Small RCT
Abstract-based medicine Have you noticed?
Blindfolded RL Web Service for Research…
Entity Lookup Service
BRI (Blinded Record Index)
Sit
e 1
BRI Hash
Service
PHI
Auth/
Clin Data
Internet
Centralized Research
Core
Hosted At Trusted Central Authority Research
Sit
e 2
BRI Hash
Service
PHI
Auth/
Clin Data Sit
e 3
BRI Hash
Service
PHI
Auth/
Clin Data
Summary
• Blindfolded record linking is a solution to maintaining privacy and achieving linking
• Blindfolded record linking is viable and practical– currently running in production
– Meets clinical use case requirements : generally far more stringent
– Large population sets
• Current efforts/architecture can be extended to include blindfolded linking
Recommendations• Spawn distributed “blindfolded” linkage pilots and compare
performance against non-linked pilots
• Study scale of the “overlap” and “missed” signal problem without record linkage stratified across disease states (hypothesis: chronic disease interventions over time required linked datasets)
• Utilize early experience to inform a multi-year roadmap for Active Linked Surveillance
• Determine a stratification model that attempts to match research question to data types and need for linkage
• Publish reference architecture for distributed and linked query system
“It is not the strongest of the species that survive, nor the most intelligent, but the one
most responsive to change”
Charles Darwin
67