67
LINKING DATA ACROSS TIME AND SOURCES Vik Kheterpal, MD March 23, 2012 Principal [email protected]

LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

LINKING DATA ACROSS TIME AND SOURCESVik Kheterpal, MD March 23, 2012Principal

[email protected]

Page 2: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

MY CAB RIDE ON THE WAY IN…

2

Page 3: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

MY CHARGE

1. What is the state of the art approaches to linking data across multiple sources?

2. How will these approaches contribute to increasing opportunities?

3. What are the short and long term challenges?

4. What are the top 2-3 things that need to happen?

5. Who are the relevant stakeholders that must be engaged?

3

Page 4: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

MY PLAN

1. A bit of reflection and philosophy .. Where have we been?

2. Link between “linking” or connecting healthcare and unlocking the value proposition from HIT investments

3. Solution description

4. Deep work not just Deep thinking – this is real and works now

5. Suggestions for areas where we need help

4

Page 5: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

5

LINKING DATA ACROSS TIME AND SPACE(CONNECTING HEALTHCARE)

Page 6: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

All Paper � Comprehensive Clinicians Desktop

HIT EVOLUTION – FOCUS ON DIGITIZING PAPER

“Epic”

� Integrated, comprehensive on-line medical record

� Full HIS/administrative integration

� Decision Support and evidence based guidelines

� Web, wireless, mobile –anytime anywhere.

� Innovation in “filing systems”

� Lab and Pharmacy departments automated

� No systems for clinicians

� HIS Systems (lab, pharma, ADT) & silo solutions– OB

� Limited enterprise workflow

� No IT integration

Paper Medical RecordsBack Office Automation

� Fragmented, clinical area specific solutions : ICU, ED, OB, OR…

� Minimal integration

� Legacy IT architectures

� Failed execution on integration and site configuration

Best of Breed Solutions

RISLab

ADT

Pharmacy

Paper Med Recs

PACS

ICU OB

ED

Transcription

OR

HISPACS

OB ICU OR ED

1980s

1990s

Current

Generation

Right Now

Page 7: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

CHALLENGES – REAL AND PERCEIVED ARE MANY: RECENT BLOG ENTRY

7

Page 8: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Crap process + Technology = Fast Crap

“I like to describe technology as the great magnifier. The challenge is that it will magnify both the good and bad elements of your processes. Fix the process before you apply the technology.” (From: 101 EMR and EHR Tips)

8

IT - “THE GREAT MAGNIFIER”

Page 9: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Facility Focus : Just Get My Doctors and Nurses on-line

Traditional HIT Focus : EMR, PMIS, Departmental, HIT etc.

Pace of Adoption Quickening

Fragmented Healthcare : Geographic and Sub-specialization

Trends Continue

Islands of Automation

RISE OF THE DOTS….DIGITIZATION OF HEALTHCARE LAST 30 YEARS

Page 10: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

DUBIOUS INHERENT ROI… “FREE MARKET” STATE PRIOR TO HITECH

10

• 20% penetration = 80% failure rate

• $39 Billion incentive (“bribe”) from the feds to adopt technology –“meaningful use”

• Why?

Page 11: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

11

OFFICE AUTOMATION IN 1980’sPRODUCITVITY GAP AFTER $1T SPEND?

Page 12: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

CLINICAL CONTEXT – A REMINDER• Chronic disease has replaced acute disease as the

dominant health problem

• Chronic disease is now the principal cause of disability & use of health services

• Chronic disease consumes 78% of health expenditures

Holman H. JAMA Vol 292, No.9, Sept 1, 2004

Page 13: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

SO.. WHY SHOULD WE CARE?ACUTE vs. CHRONIC

• Acute = episodic

• commonly a cure

• inexperienced, passive patient

• physician administers treatment

• Chronic = continuous

• rarely a cure

• patient lives indefinitely with the disease, symptoms & consequences

• persistent treatment

• patient has integral role in the treatment process

• Multiple caregivers involved in process over long periods of time

Holman H. JAMA Vol 292, No.9, Sept 1, 2004

Page 14: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

WHY EXCHANGE OR LINKING?

• The nature of care changes over time

• Must be managed over time as disease evolves with shifting severity, pace & treatments

• Good management is an unfolding process, best provided by a multi-disciplinary team of professionals

• Continuity & coordination of care are essential

• Information About the Patient Needs to Move Thru the “System” as fast as the patient does.

Page 15: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

PRACTICE OF CHRONIC CARE MEDICINE REQUIRES A DIFFERENT APPROACH

• The nature of care changes over time

• Must be managed over time as disease evolves with shifting severity, pace & treatments

• Good management is an unfolding process, best provided by a multi-disciplinary team of professionals

• Continuity & coordination of care are essential

• Information About the Patient Needs to Move Thru the “System” as fast as the patient does.

Page 16: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

16

THE ULTIMATE GOAL: QUALITY …REQUIRES INTEROPERABLE DATA

Have access to longitudinal

patient history before clinical decisions are

made

Improved Coordination of Care

Reduced

Costs

Reduced

Fragmentation

Improved

Outcomes

MISSION

Page 17: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

CONNECTED HEALTH SYSTEMS…

The goal is not to move from “paper silos” to “electronic silos”

The goal is an electronic health system that supports and requires the movement of interoperable health information supporting:

•Continuity of Care

•Population Needs (pandemics and other disasters)

•Bench to Bedside Research

•Disability Determination

In a January 9, 2009 speech at George Mason University

"To improve the quality of our healthcare while lowering

its cost, we will make the immediate investments necessary to ensure that,

within five years, all of America’s medical records are computerized.”

Page 18: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

THE “POST-EMR” ERA IS UPON US

18

• Just like “post-pc”

• The EMR is just a means of digitizing information – by ITSELF not sufficient to accruing transformational systemic value

Page 19: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

IT’S NOT ABOUT THE DATA/HIT ITSELF: IT IS WHAT WE DO WITH IT AND HOW

Page 20: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

THE “POST-EMR” ERA IS UPON US

• Visualization

• Episode Grouping

• Transformation/Translation

• White Space Mgmt – not about digitizing “clinical encounters” – TOC/Care Coordination

• Extrapolation

• Actionable Information and Analytics – Prospective; not just retrospective

Page 21: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

vSPR – “Expedia for Healthcare”

Problem Summary List

Care Timeline

All sites of care

Page 22: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Today’s Problem Summary List

On Longitudinal Basis Complex Patients Can be Too Difficult to “Understand” Quickly

Care Timeline

Procedure / Med Hx

Currently Distinct From the Problems

Page 23: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Medical Episode Groups / Disease Clusters

ASTHMA MAINTENANCE

ASTHMA W/ COMPLICATIONH

H

H

TRAUMA – FRACTURE H

Disease Clusters to compress the “row noise”

Crisp – Cogent Display

Can see Forest Through The Trees Now

Can Compress Display overall

Page 24: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Step 1: Demonstrates the ability to qualify initiatives…. i.e. Readmits are 1% of patients and 11% of costs.

Page 25: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Step 2: Validating and Calibrating a predictive model. Demonstrates our ability to implement an industry standard model (LACE) and calibrate the implementation of the model in a care coordination initiative (i.e. select a score that translates a manageable number of cases for the case management team).

Page 26: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Step 3: A predictive model in production at CHNw: Operationalize the intervention

Page 27: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Real time reporting is integrated with the full patient record… case managers have zero hunting/gathering.

Page 28: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Step 4: The effectiveness of a proposed care coordination intervention can be tracked over time… so that iterative improvements can be made.

Page 29: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

NEED FOR LINKING A PICTORIAL• Profound and Basic

Need• Subject Duplicity

• Longitudinal Follow up• Acute Events IN CDa

• Prescription Hx in EMR1

• ER in CD1

• Genome in CD2

• Cant’ really first anonymize and then link

• Issue• Privacy Concern (Not TPO)

• Business Issues

Claims Dataset A

EMR1

EMR1

Claims Dataset B

Page 30: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

WARNING : ACRONYMS AHEAD

Page 31: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

RECORD LINKING BASICS

• Methodology– Typically pair wise comparisons

• Vikas Kheterpal vs. Vick Khetarpal• Vick Khetarpal vs. Victor Kheterpal

– Probablistic versus deterministic

• Issues in General– False positives aka specificity– False negatives aka sensitivity

• Underlying Data Characteristics– Dirtiness– Transliteration– Uniformity of datasets

• Name Issues (alias, ethnicity, marriage)

Page 32: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

HEALTHCARE SPECIFIC ISSUES

• Intolerance for false positives

• Lack of uniformity of identifiers

• Multitude of data sources– More than pair-wise links

– N-way links within a network

• Core identifiers/Minimum versus tie-breakers– Core: Recipient#, SSN, Name, Gender, DOB

– Supplementary: address, phone, ec.

• Intentionally dirty data

Page 33: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

LINKAGE METHODOLOGIES

• Rapid Implementation

• Simple Calculation

• Relies on Accurate Data -typically

• May not functional well with other data sets

• Complex Implementation

• Computationally intensive

• More Forgiving of Data Errors

• Algorithm is customized to data being linked

Deterministic Probabilistic/Statistical

Page 34: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

NAME SCALING

Page 35: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

CryptoRLS™ MPI

• Combination of deterministic and probabilistic

• Core Prob Algorithm: proven on largest data set in the world– US Census Bureau– SAMSHA Fed Agency

• Built from the ground up for specificity– Alias names– Transliteration– Address standardization– Consistency checker

Page 36: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

World Class Record Linking : High Specificity and Sensitivity Proven

Definite

Match Linker

Record

Store

Blocked

Records

Possible

Match Linker

Consistency

Checker

Possible

Matches

Definite

Matches

Record To

Link

Possible Match Linker

TotalScore =

SSN sim * SSN wt +

FN sim * FN wt +

LN sim * LN wt +

DOB sim* DOB wt +

Gender * Gender wt +

Race * Race wt +

ZipCode * ZipCode wt +

IF

TotalScore > Threshold

THEN

Match

Definite Match Linker

IF

SSN match AND

DOB match AND

Gender match AND

( FirstName 80% sim OR

LastName 90% sim )

OR

FirstName 80% sim AND

LastName 90% sim AND

DOB match AND

Gender match AND

( ZipCode match OR

Race match )

OR

THEN

Match

Blocking Query

IF

SSN match

OR

FirstName match AND

LastName match

OR

DOB match AND

Gender match

THEN

Block

Page 37: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

The Problem: Centralized MPICritical Weak Link for Privacy

1. Data aggregation increases value to attackers – one stop … all citizens

2. Large number of entities with legitimate need to access the RLS… increases vulnerability

3. “discoverability” of information by government agencies

4. Threat from within

5. Fostering “trust” amongst competing provider entities.

Page 38: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

“BLINDFOLDED RECORD LINKING”

1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking without even KNOWING the contents

2. No risk in managing entire regional population

3. No clinical data centralization

4. Protects from• Internal threats

• Disgruntled employees

• External hacks

• Inadvertent loss (theft, backup distribution)

5. Web Services provide a “catcher-pitcher” handoff

Page 39: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Use Cryptographic Techniques Like Hashing to Obfuscate The PHI.. “ The SHA hash functions are a set of cryptographic hash functions designed by the National Security Agency (NSA) and published by the NIST as a U.S. Federal Information Processing Standard. SHA stands for Secure Hash Algorithm.” (wikipedia)

“The four hash algorithms specified in this standard are called secure because, for a given algorithm, it is computationally infeasible 1) to find a message that corresponds to a given message digest, or 2) to find two different messages that produce the same message digest” NIST FIPS140-2 Standard Published 2003.

Page 40: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

How To Perform Effective “linking” andHashing • Misspellings would generate completely different

“hashes” or fingerprints

• Different permutations depending on type of identifier– Names: Bigrams, Nicknames, NYSIS codes, etc

• Bigram: All subsets of the set of 2 consecutive characters in a string.

• Example: Pete -> {“pe”, “et”, “te”}, {“pe”, “et”}, {“pe”, “te”}, {“et”, “te”}, {“pe”}, {“et”}, {“te”}

– Numeric: Transpositions, Off-By-One, etc– Date: Month-day swap, Off-by-one, etc

• Permutations provide ability to partial-match identifiers even though we’re blinded.

Page 41: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Blinded Record Index Steps

Records at participating institutions with identifiers (First Name, Last Name, SSN, DOB, …)

1)Standardizer Pass (delimiters, junk values, spaces)

2)Create permutations of identifiers with associated similarity score (NYSIS, Soundex, transpositions, Nicknames, …)

3)One-way hash the permutations

4)Persist hashed permutations in the BRI

5)Find other records in the BRI with matching permutations.

6)Link to other records based on identifier similarities

7)Group linked records into Patients

Within Each Site Adapter

At the “Center”

Page 42: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Permutation Generation

• Different permutations depending on type of identifier– Names: Bigrams, Nicknames, NYSIS codes, etc

• Bigram: All subsets of the set of 2 consecutive characters in a string.

• Example: Pete -> {“pe”, “et”, “te”}, {“pe”, “et”}, {“pe”, “te”}, {“et”, “te”}, {“pe”}, {“et”}, {“te”}

– Numeric: Transpositions, Off-By-One, etc

– Date: Month-day swap, Off-by-one, etc

• Permutations provide ability to partial-match identifiers even though we’re blinded.

Page 43: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Permutation and Hashing

RecordFirstName

Pete

Last Name

Doe

DOB 1/2/03

SSN 123-45-6789

Permutation

Score

{“et”,”pe”,”te”} 1.0

{“et”,”pe”} 0.7

{“et”,”te”} 0.7

“peter” 0.8

…….. ……

Permutation

Score

123456789 1.0

123456798 0.8

123456788 0.9

12345678x 0.7

....... …..

Permute

Permutation Score

1/2/03 1.0

2/1/03 0.7

1/2/04 0.7

….. …..

Hash

Hash Score

0x0123ab49c.. 1.0

0xa84f04… 0.7

0x7fb885.. 0.7

0x530de8.. 0.8

…….. ……

Hash

Hash

Page 44: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Similarity Calculation

FNameHash

Score

0xbb23a4b9c.. 1.0

0xa84f04… 0.8

0xbe8785.. 0.7

0x530de8.. 0.7

…….. ……DOB Hash Score

0x0123ab49c .. 1.0

0xa74f04… 0.7

0x7fb885.. 0.7

0x530de8.. 0.8

…….. ……

SSN Hash Score

0x530de8 .. 1.0

0xafb885… 0.7

0xdea745.. 0.7

0x8.. 0.8

…….. ……

FNameHash

Score

0x5123ab49c.. 1.0

0x8af404… 0.7

0xfee885.. 0.7

0xa84f04 .. 0.8

…….. ……DOB Hash Score

0x0123ab49c .. 1.0

0xa74f04… 0.7

0x7fb885.. 0.7

0x530de8.. 0.8

…….. ……

SSN Hash Score

0x1f23ab49c.. 1.0

0xa84f04… 0.7

0xafb885.. 0.7

0x530de8 .. 0.8

…….. ……

Record A Record B

Identifier Score

First Name 0.64

Last Name 0.9

DOB 1.0

SSN 0.8

…….. ……

A��B Similarity

Matches

Highest Scoring Match

Lower Scoring Matches

Page 45: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Link Creation

Identifier Score

First Name 0.64

Last Name 0.9

DOB 1.0

SSN 0.8

…….. ……

A��B Similarity

Rule Based

Condition Outcome

FName>.6 + LName>.8 + SSN>.7 + DOB=1

DefiniteLink

SSN>.7 + DOB>.9 + LName>.8 PossibleLink

…….. ……

Result

Definite Link

A B

Page 46: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Consistency Checker Pass

• Typical RL approach: pair-wise links (even with advanced Bayesian probabilistic algorithms)– Only as good as core algorithm

– Weakest link creates issues

• Healthcare: n-way linkage– A1: Bithika S. Kheterpal, 11/8/68, F, SSN1

– A2: Bithika S. Malhotra, 11/8/68, F, SSN1

– A3: Bithika S. Malhotra, 11/8/68, F, missing SSN

– A1=A2; A2=A3 but A1<>A3

• Consistency Checker will promote the A1 link.

• Impact: HUGE improvement in sensitivity without sacrificing specificity

Page 47: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Production Use: Three Use Cases

47

• Infrastructure for robust statewide or regional HIE or care coordination initiatives

• Infrastructure for determining >30 day mortality rate (blinded death master file lookup)

• Infrastructure for linked multi-center trials where need very large n across multiple centers – while preserving privacy (no patient consent required)

Page 48: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

THREE STATES – SIMILAR NEEDS

48

• 32,000 Square miles• 4.01 Million Population• NC and Georgia

bordering

• 24,231 sq miles• 1.9M population• Landlocked – Ohio, PA,

Kentucky, Virginia

• 52,423 sq miles• 4.45 M population• Mississippi, TN, Georgia,

Florida

1. Algorithm powers the statewide HIE infrastructure in three states

2. Connections to claims (medicaid/CHIP), public health (IZ, ELR), and provider clinical data (hospitals, clinics)

3. Gateway to federal partners (VA etc), and Interstate Partners

4. Production use for clinical use – certainly acceptable for population/secondary uses

Page 49: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

How Do We Know It Works?

• Live in three states – dozens of hospitals for clinical care

• SCHIEx – 6.2M patient over 10 years from > 200 distinct facilities

• Weekly Statistical Sample Manually Reviewed

• Statistical Process compared with Clinical Process ; clinical RLS has higher specificity

• No reported false positive in 3 years of production use.

• Live in three states – dozens of hospitals for clinical care

• SCHIEx – 6.2M patient over 10 years from > 200 distinct facilities

• Weekly Statistical Sample Manually Reviewed

• Statistical Process compared with Clinical Process ; clinical RLS has higher specificity

• No reported false positive in 3 years of production use.

Page 50: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

ANOTHER APPLICATION – DETERMINING MORTALITY RATE PAST 30 DAYS– How to manage 30+ day mortality information for “your” clinical

intervention?

• Individual contact – very expensive operationally and dubious results

• Other approaches like death database lookups can be more efficient

– National Databases

• SSA publishes the Death Master File

• NTIS distributes a variety of methods of looking up information

– How to subscribe and use this lookup without disclosing PHI

• Most of patients you will lookup will still be alive (hopefully ☺ )

• How to do it minimal IT headache

50

Page 51: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

DMF Background

51

Page 52: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Two Basic Approaches For Lookup

• Lookup using various websites– Individual lookup

– Submit files

– Web based queries

– All suffer from 1 basic issue

• You must share your PHI with an external entity to perform the lookup

• Download the file and build a tool internally– No disclosure of your PHI to external entities

– Issue : IT effort + approx $5K in master file license fees annually

52

Page 53: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Privacy Protecting Lookup

53

• Death database lookup– Application to convert PHI to one-way hashed code

– Updated to use SHA-256 hashing algorithm

– Pings central server for lookup

– Match or no-match + date of death returned

Page 54: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Self Service Tool To Invoke Lookup

54

Page 55: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Sample Text File Within Your Facility

55

Page 56: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Sample Results File

56

Page 57: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

SIMPLE TO ADOPT INTO RESEARCH PROTOCOL

• Get URL from WEBSITE

• Download Hashing Widget from web site

• Prepare Text File

• Submit File

• Get Answer Back

57

Page 58: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Production Use: Three Use Cases

58

• Infrastructure for robust statewide or regional HIE or care coordination initiatives

• Infrastructure for determining >30 day mortality rate (blinded death master file lookup)

• Infrastructure for linked multi-center trials where need very large n across multiple centers – while preserving privacy (no patient consent required)– MPOG: Multicenter Perioperative Outcomes Group

http://mpog.med.umich.edu/

– International consortium of leading health centers

– Supported by the Anesthesia Quality Institute

Page 59: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Ex. PERIOPERATIVE PERIOD

• Physically intrusive intervention

• Risky and expensive

• Very difficult to blind study individuals

• Challenging to ethically randomize – Difficult airway, Hypotension, Hypertension

• “Random” clinical decisions rampant– Wide variation in practice because of few guidelines

• Challenging to recruit “priority populations”– Pediatrics– Emergency surgery w/ non-optimized patient– Racial & Ethnic Minorities

Page 60: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Like most veterinary students, Doreen breezes through Chapter 9.

RCTs: THE GOLD STANDARD…NOT FOR EVERYTHING

Page 61: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

POST RELEASE CONTROVERSIES

Page 62: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

SOMETIMES NOT SO GOLDEN

• Controlled trial is not routine clinical practice

• Specific, small study extrapolated to population at large

$$$ / patient �small study

Infrequent events � large study

Perioperative Research

Page 63: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Small RCT

Broad Use

Small RCT

Broad Use

Large Retrospective ReviewSmall RCT

Broad Use

Large Retrospective Review

Large RCT

Small RCT

Broad Use

Large Retrospective Review

Large RCT

RESEARCH CIRCLE OF LIFE

Small RCT

Abstract-based medicine Have you noticed?

Page 64: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Blindfolded RL Web Service for Research…

Entity Lookup Service

BRI (Blinded Record Index)

Sit

e 1

BRI Hash

Service

PHI

Auth/

Clin Data

Internet

Centralized Research

Core

Hosted At Trusted Central Authority Research

Sit

e 2

BRI Hash

Service

PHI

Auth/

Clin Data Sit

e 3

BRI Hash

Service

PHI

Auth/

Clin Data

Page 65: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Summary

• Blindfolded record linking is a solution to maintaining privacy and achieving linking

• Blindfolded record linking is viable and practical– currently running in production

– Meets clinical use case requirements : generally far more stringent

– Large population sets

• Current efforts/architecture can be extended to include blindfolded linking

Page 66: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

Recommendations• Spawn distributed “blindfolded” linkage pilots and compare

performance against non-linked pilots

• Study scale of the “overlap” and “missed” signal problem without record linkage stratified across disease states (hypothesis: chronic disease interventions over time required linked datasets)

• Utilize early experience to inform a multi-year roadmap for Active Linked Surveillance

• Determine a stratification model that attempts to match research question to data types and need for linkage

• Publish reference architecture for distributed and linked query system

Page 67: LINKING DATA ACROSS TIME AND SOURCES“BLINDFOLDED RECORD LINKING” 1. Just as the Bank does not know the contents of the safety deposit box, with Crypto-RLS you can provide linking

“It is not the strongest of the species that survive, nor the most intelligent, but the one

most responsive to change”

Charles Darwin

67