Patient Matching A-Z Wednesday, March 2nd 2016

Patient Matching A-Z

Wednesday, March 2nd 2016

Adam W. Culbertson, Innovator-in-Residence HHS, HIMSS

•Overview of Innovator-in-Residence Program

•Background on Patient Matching •Challenges to Matching •Evaluation of Matching Algorithms and Metrics

Overview

• Brings entrepreneurial individuals into HHS through collaboration with private and not-for-profit organizations

• HIMSS funded fellow working in collaboration with HHS CTOs office, IDEA LAB and the Office of the National Coordinator for Health IT

• Patient Matching Final Report, identified patient matching as a critical barrier to interoperability

• Two Year Fellowship Started August 2014-August 2016

Innovator-in-Residence (IIR) Program

High Level Goal of Interoperability

Simplest Model Client Server

Generation

Storage

Transport

Merge

Structure

Governance Policy Considerations

All Pieces needed for

Interoperability

This is where standards are

important

Background

Significant Dates in (Patient) Matching

A Framework for Cross-Organizational

Patient Identity Management

2015

Kho, Abel N., et al Design and

Implementation of a Privacy Preserving

Electronic Health Record Linkage Tool

HIMSS Patient Identity

Integrity

Grannis, et al Privacy and Security

Solutions for Interoperable Health Information

Exchange

2009

Joffe et al A Benchmark Comparison

of Deterministic and Probabilistic Methods for Defining Manual Review

Datasets in Duplicate Records Reconciliation

Dusetzina, Stacie B., et al Linking Data for Health

Services Research: A Framework and Instructional Guide

HIMSS hires Innovator In Residence (IIR) focused

on Patient Matching

Audacious Inquiry and ONC

Patient Identification and Matching Final Report

2014

HIMSS Patient Identify Integrity Toolkit,

Patient Key Performance

Indicators

Winkler Matching and

Record Linkage

2011

Newcombe, Kennedy, & Axford

Automatic Linkage of Vital Records

1959

Dunn Record Linkage

1946

Soundex US Patent 1261167

1918

Fellegi & Sunter A Theory of

Record Linkage

1969

Grannis, et al Analysis of Identifier Performance Using a Deterministic Linkage

Algorithm

2002

Campbell, K et al A Comparison of Link Plus, The Link King, and a “Basic”

Deterministic Algorithm

RAND Health Report

Identity Crisis: An Examination of the Costs and Benefits of a Unique Patient Identifier for the US Health Care System

2008

Patient Matching Definition

Patient matching: Comparing data from multiple sources to identify records that represent the same patient. • In Healthcare involves matching varied

demographic fields from different health databases to create a unified view of a patient.

Identity Matching / Identity Resolution

Identity analysis:

link analysis, data mining

Identity resolution:

Merge/dedupe records

Identity matching Measure record similarity.

Search/retrieval

Attribute matching Compare name, DOB, COB, address, etc.

Identity data repository

Structured and unstructured data sources

Challenges in Matching

Challenges • Data availability • Data Quality • Lack of adoption of metrics

Availability of Data Attributes

% Availability of Attributes Over Region

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%Fi

rst N

ame

Mid

dle

Nam

eLa

st N

ame

Date

of B

irth

Birt

h Ye

arG

ende

r

Soci

al S

ecur

ity N

umbe

r

Addr

ess (

full)

Stre

et A

ddre

ss L

ine

1

City

Stat

ePo

stal

Cod

e

Coun

try

Abbr

evia

tion

Coun

try

Full

Nam

e

Phon

e N

umbe

r (an

y)

Hom

e Ph

one

Num

ber

Cell

Phon

e N

umbe

r

Wor

k Ph

one

Num

ber

Emai

l Add

ress

Nic

knam

e

Insu

ranc

e N

umbe

r (fr

ee te

xt)

Driv

ers L

icen

se N

umbe

r

Race

(OM

B)Ra

ce (f

ree

text

)Et

hnic

itiy

Lang

uage

Occ

upat

ion

Inco

me

Mar

ital S

tatu

sHe

ight

(cm

)He

ight

(m)

Heig

ht (i

n)He

ight

(ft)

Wei

ght (

lbs)

Wei

ght (

kg)

Bloo

d Ty

pe

Site B

Site A

Site C

Data Quality

• Data Quality is a Key – Garbage in and Garbage out

• Data entry errors are compound data matching complexity – Various algorithmic solutions to address these, not perfect

• Types of errors: – Missing or Incomplete Values – Inaccurate data – Fat finger errors – Information is out of date – Transposed names – Misspelled names

Data Quality

• Transposition errors • Mary Sue vs Sue Marie • Smith, John vs John, Smith

• Names change over time • Marriage, Divorce

• More than one way to spell name • Jon, John

• Data entry – Fat-finger = typo, transposition, etc.

• Phonetic variation

Data Quality

Metrics for Algorithm Performance

• Ideal outcome of any matching exercise is correctly answering this one question hundreds or thousands of times, Are these two things the same thing?

– Correctly identifying all the true positives and true negatives while minimizing the number of errors, false positives and false negatives

Patient Matching Goal

• True Positive- The two records represent the same patient

• True Negative- The two records don't represent the same patient

Patient Matching Terminology

• False Negative: The algorithm misses a record that should be matched

• False Positive: The algorithm creates a link to two records that don’t actually match

Patient Matching Terminology

EHR A EHR B Truth (Gold Standard)

Algorithm Match Type

Jonathan Jonathan Match Match True Positive

Jonathan Sally Non-Match Non-Match True Negative

Jonathan Sally Non-Match Match False Positive

Jonathan Jon Match Non-Match False Negative

Evaluation

Good

Bad







Evaluation

Bad







Evaluation

Bad

Truth

Algorithm

Positive Negative

Positive True Positive False Positive

Negative False Negative True Negative

Evaluation

Recall

Precision

Precision = True Positives / (True Positives + False Positives)

Recall = True Positives / (True Positives + False Negatives)

• Calculation – Precision = True Positives / (True Positives +

False Positives)

– Recall = True Positives / (True Positives + False Negatives)

• Tradeoffs between Precision and Recall – F Measure

Evaluation

Summary • Patient matching is old problem • Need to understand data attributes available • Understand their quality • Follow a systematic approach to evaluation

– Methodology to create ground truth data – Metrics

• Precision • Recall

Adam W. Culbertson : [email protected]

Contact Information

Questions?

Backup

Creating Test Data Sets

Development of Test Data Set

Patient Database

Select Potential Matches (aka Adjudication Pool)

Compare Algorithm and Test Data Set

Human-Reviewed Match Decisions (Answer Key == Ground Truth Data Set)

Manual Reviewer 1

Manual Reviewer 2

Manual Reviewer 3

Development of Ground Truth Sets • Identify data set that reflects real word use case

• Develop potential duplicates

• Human adjudication review and classification – Match or Non-Match

• Estimate truth

– Pooled methods using multiple matching methods

Issues In Establishing Ground Truth

Examples B Smith Bill Smythe William Smythe W Smith ??

Activity: Patient Names

/li/

‘Li’ ‘Lee’

‘Leah’

‘Leigh’ /le.ɑ/

/li.ɑ/

/lei̯/

‘Lay’

‘Laye’

/lai̯/ ‘Lie’

‘Ligh’

Quoi?

Patient Names (Answers)

Jean Rimbaud (OK, or John….)

Leigh Cramer

Alice Slawson

I don’t know what your neighbors’ names are… … but did you get them right? … did you get the *whole* name right?

Identity Matching Adjudication Collector (IMAC) User Interface

One screen of the Adjudication Collector continually provides questions to the adjudicator which need to be answered. These screens first ask the question with no dates provided and then again asks the question with dates shown.


• Different truth for different applications – Credit check – Security applications – Customer support – De-duplication of mailing lists

• What is the cost of missing a match? – New record entered into database – Irritated customer – Lives are lost

• Criteria for truth must be carefully established and well-understood by annotators

– Question posed to annotators must be carefully phrased


• How much time / expertise is available to judge (/discount) false positives?

• Needs to reflect real word test use case • Evaluation results are only as good as the truth on

which they are based – And only as appropriate as the evaluation is to the task that will

be performed with the operational system

• Absolute recall impossible to measure without

completely known test set (i.e. “You don’t know what you’re missing.”)

– Estimate with pooled results

Issues In Establishing Ground Truth • First step in evaluation is to determine why the

evaluation is being conducted • Different truth for different applications

– Security Applications vs Patient Health Record

• What is the cost of missing a match? – Security: Lives are lost – Health: Patient safety event, missed medications, allergies,

etc… death But…this is situation today.

• What is the cost of wrongly identifying a match? – Security : Passenger is inconvenienced / delayed – Health: Patient safety event, wrong medication, treatment,

liability, death

• Criteria for truth must be carefully established and well-understood

– E.g. Question posed to annotators must be carefully phrased

Summary for Healthcare Use Case

The Trade-off Between False Positive and False Negative Matches

• As the match score threshold is increased, the number of false positives decreases, but false negatives increase. (increasing precision)

• As the match score threshold is lowered, the number of false negatives decreases, but false positives increase (increasing recall)

Source: Grannis, S. Introduction to Record Linkage. September 27, 2012

Basic IR Metrics: Precision and Recall

“Subject”: MAHMOUD ABDUL HAMEED

12/10/1945

False positives

False negatives

“Target List”:

‘True’ Answers

System returns

Precision (P) = X/Y

Recall (R) = X/Z

X

Y

Z

MOREY APPLEBAUM MOHAMMED ABDUL HAMID MAHMOUD ABD EL HAMEED MAKMUD ABDUL HAMID MAHMOUD ABD ALHAMID

(2/4)

(2/3)

True Positives

Precision and Recall Inversely Related (1)

Database

‘True’ Hits

System returns

Recall Increased, but Precision Fell

The ‘Low Hanging Fruit’ phenomenon – more false hits will come in for every true one

Precision and Recall Inversely Related (2)

Database

‘True’ Hits

System returns

Precision Increased, but Recall Fell

More selective matching

What Makes a Good Evaluation? • Objective – gives unbiased results • Replicable – gives same results for same inputs • Diagnostic – can give information about system

improvement • Cost-efficient – does not require extensive

resources to repeat • Understandable – results are meaningful in some

way to appropriate people • Well-documented – also contextualizes results in

terms of purpose of the evaluation and task

IMAC – Admin Interface

An administrative screen allows the ability to manage IMAC users as well as manage the questions asked of users. This includes the ability to set the priority of questions and the number of judges to be used for each question.

Evaluation: Like IR Tasks • Metrics

– F-measure - harmonic mean of precision and recall • F = (β2 + 1) P R / ( (β2 P) + R) where P = precision = correct system responses / all system responses R = recall = correct system responses / all correct reference responses β = beta factor– provides a mean to control the importance of recall over precisio

– Additional Measures • False positives – items that are identified as correct responses that are

not correct responses (= 1 – Precision)

• False negatives – correct responses not identified (= 1 – Recall) • Fallout = non-relevant responses / all non-relevant reference responses

(related to, but not directly calculable from precision / recall) Issue: • Annotation Standard for Development of Ground Truth

• Large Affects on performance due to algorithm tuning

• Tuning is need specific • Setting Cut-offs

– Upper Thresholds – Feature Selection – Feature Weighing

• Blocking

Algorithm Tuning

Algorithm Performance

Algorithm

Algorithm Tuning

Data Quality

Framework for Evaluation: EAGLES 7-Step Recipe/ISLE FEMTI* 1. Define purpose of evaluation – why doing the

evaluation 2. Elaborate a task model – what tasks are to be

performed with the data 3. Define top-level quality characteristics 4. Produce detailed system requirements 5. Define metrics to measure requirements 6. Define technique to measure metrics 7. Carry out and interpret evaluation

Originally developed as an evaluation framework for Machine Translation, but authors note that it should be able to be used as a generic evaluation framework.

*Acronyms: EAGLES – European Advisory Group on Language Engineering Standards ISLE – International Standards for Language Engineering FEMTI – Framework for the Evaluation of Machine Translation in ISLE (http://www.issco.unige.ch/femti)

• What is the question you are trying to answer? • What data attributes do you have? • What is the quality of the attributes? • What is matching method do you want doing use? • How effective is your matching method?

Framework Applied to Patient Matching

Documents

Patient Matching A-Z Wednesday, March 2nd 2016