Upload
benedict-newman
View
214
Download
0
Embed Size (px)
Citation preview
1
Data Linkage Data Linkage ProjectProject
Florida’s Newborn Florida’s Newborn Screening Screening ProgramProgram
Gary SammetBureau of Vital Statistics
2
OutlineOutline
Data Linkage ApproachData Linkage Approach Start with Probabilistic LinkingStart with Probabilistic Linking Data Linkage Automated Process Data Linkage Automated Process
Flow Flow Data Processing Design: Linking Data Processing Design: Linking
Variables, Weights, Bonuses, Use Variables, Weights, Bonuses, Use of Jaro-Winkler of Jaro-Winkler
Data Processing Sample ResultsData Processing Sample Results
3
Data Linkage ApproachData Linkage Approach
VS & LAB work closely togetherVS & LAB work closely together System can accommodate needsSystem can accommodate needs Reduce duplication of effortsReduce duplication of efforts ReconciliationReconciliation
All births have a screening recordAll births have a screening record All screening records have a birthAll screening records have a birth
Most cost effective with best Most cost effective with best resultsresults
4
Start With Probabilistic Start With Probabilistic LinkingLinking
Identify linking variables - assign Identify linking variables - assign initial weight based on understanding initial weight based on understanding & experience w/data& experience w/data
Run initial linking - sort by weight & Run initial linking - sort by weight & display linkage flags to see data display linkage flags to see data patterns/anomaliespatterns/anomalies
Adjust weights as needed w/o Adjust weights as needed w/o changing codechanging code
Define deterministic rules to ensure Define deterministic rules to ensure consistent linking in automated consistent linking in automated processprocess
Data Data LinkageLinkage Automated Automated ProcessProcess
Drop Indexes
SourceStaging
Start
Truncate Staging Tables
Load Staging Table LABS
LABS.LifeCycle
Unduplicate LABS data based on Original
Specimen
Data Conversions for Linking
Create Indexes on Staging
Extract Birth data based on DOB and Index
VitalStat.Birthmaster
HPE.VitalStat.Birthlink
Link Data With Blocking Factor=DOB
Update flag fields
Update weighted score and flags
using Jaro-Winkler
Update weighted score adding
bonus for combinations
Unduplicate record pairs keeping
highest weighted score
VitalStat.Lablink2Undup
Stop
Link Unlinked LAB Data With Blocking Factor=DOB Year and
Month
SourceStaging
VitalStat.SourceStaging
KEY: Orange – Data Store
Blue - Process
Deterministic Linking
Linking Linking VariablesVariables & & WeightsWeights
Time of birth 0.
85
Facility Name and Zipcode0.
75
Facility Name w Jaro-Winkler Score .899+ and match ZipCode
0.65
Facility Name w Jaro-Winkler Score .800-.898 and match ZipCode
0.55
Facility Name w Jaro-Winkler score .899+ and match Facility City
0.65
Facility Name w Jaro-Winkler score .800-.898 and match Facility City
0.55
Facility Name w Jaro-Winkler score .899+ and Facility Address w JW score .85+
0.65
Facility Name w Jaro-Winkler score .800-.898 and Facility Address w JW score .85+
0.55
Facility Address w Jaro-Winkler score .85+ and match Facility City
0.65
Infant Last Name w JW score .899+ match to Last Name/Mother Last Name
0.65
Infant Last Name w JW score .800- .898 match to Last Name/Mother Last Name
0.30
Mother Current Last Name0.
25
Mother SSN0.
25
Mother Address0.
25
Mother Full Name – Bonus0.
25
Linking Linking VariablesVariables & & WeightsWeights
Sex of Infant0.2
5
Infant Full Name – Bonus0.2
5
Infant First Name w JW score .76+ and Infant Last Name w JW score 85+
0.20
Mother First Name0.2
0
Mother First Name w JW score .76+ and Mother Last Name w JW score .85+
0.20
Mother Address w JW score .85+0.2
0
Current Address to Mother Address w JW score .899+ and match City
0.20
Current Address to Mother Address w JW score .85+ and match Mother First Name
0.20
Infant Full Name w JW score of .85+ – Bonus0.1
5
Mother Full Name w JW score of .85+ – Bonus0.1
5
Father Last Name0.1
0
Plurality0.1
0
Birth Order0.1
0
Infant First Name0.0
5
Infant Last Name0.0
5
8
Weight BonusesWeight Bonuses DOB, Time of Birth, Sex, Facility + DOB, Time of Birth, Sex, Facility +
ZipcodeZipcode(MFirst or MSSN) (MFirst or MSSN) BONUS = .50BONUS = .50
DOB, Time of Birth, Sex, Facility-JW + DOB, Time of Birth, Sex, Facility-JW + Zipcode (MFirst or MSSN) Zipcode (MFirst or MSSN) BONUS = .40BONUS = .40
DOB, Time of Birth, Sex, Facility + DOB, Time of Birth, Sex, Facility + ZipcodeZipcodeBONUS = .20BONUS = .20
DOB, Time of Birth, Sex, Facility-JW + DOB, Time of Birth, Sex, Facility-JW + Zipcode Zipcode BONUS = .15 BONUS = .15
Variables By % LinkedVariables By % LinkedDOB 99.79
SEX 97.65
BIRTH ORDER 97.39
PLURALITY 94.40
MOTHER FULL NAME – + JW 94.37
TIME OF BIRTH 93.22
MOTHER FIRST NAME 92.13
MOTHER LAST NAME 88.57
MOTHER FULL NAME 82.54
MOTHER SSN 82.06
Variables By % LinkedVariables By % LinkedTOTAL FACILITY – JW 78.32
TOTAL MOTHER ADDRESS, CITY – JW 73.08
MOTHER ADDRESS, CITY – JW 56.58
LNAME 43.59
FACILITY 41.48
FACILITY – JW 36.84
FATHER LAST NAME 35.75
MOTHER ADDRESS, CITY 16.50
MOTHER FULL NAME - JW 11.83
INFANT FULL NAME 11.70
11
Linking With Jaro-Linking With Jaro-WinklerWinkler
With Exact Facility + Zipcode Match With Exact Facility + Zipcode Match 41% - Facility & Zipcode must match41% - Facility & Zipcode must match
With Jaro-Winkler Facility + Zipcode With Jaro-Winkler Facility + Zipcode Match Additional Match Additional 36.84%36.84% Total Match = 77.84% vs. just 41%Total Match = 77.84% vs. just 41%
Examples:Examples:
LAB FACILITY NAMELAB FACILITY NAME
FLORIDA HOSP ORLANDO – FLORIDA HOSP ORLANDO – LABLAB
SHANDS AT THE UNIV OF SHANDS AT THE UNIV OF FLAFLA
BROWARD MED CTRBROWARD MED CTR
SHANDS AT JACKSONVILLESHANDS AT JACKSONVILLE
HOLLYWOOD BIRTH HOLLYWOOD BIRTH CENTER, INCCENTER, INC
VS FACILITY NAMEVS FACILITY NAME
FLORIDA HOSP ORLANDOFLORIDA HOSP ORLANDO
SHANDS AT UFSHANDS AT UF
BROWARD MEDICAL BROWARD MEDICAL CENTERCENTER
SHANDS JACKSONVILLESHANDS JACKSONVILLE
HOLLYWOOD BIRTH HOLLYWOOD BIRTH CENTERCENTER
12
Linking Linking MotherMother Address Address & City& City
Only Only 16%16% match on match on exactexact mother mother address & city address & city
Additional Additional 56%56% match on mother match on mother address & city, using Jaro-Winkleraddress & city, using Jaro-Winkler
Total Match: 72% vs. just 16%Total Match: 72% vs. just 16%
Examples:Examples:
LAB Mother AddressLAB Mother Address VS Mother AddressVS Mother Address LAB CityLAB City VS CityVS City
2323 SAMSON ROAD2323 SAMSON ROAD 2323 SAMSON RD 2323 SAMSON RD ORLANDO ORLANDO ORLANDOORLANDO
5105 NE 75TH AVE 5105 NE 75TH AVE 5105 NE 75 AVENUE 5105 NE 75 AVENUE MIAMIMIAMI MIAMI MIAMI
1001 MAIN ST APT A1001 MAIN ST APT A 1001 MAIN ST APT A 1001 MAIN ST APT A KEY KEY WEST KEY WESTWEST KEY WEST
532 HORNET CT532 HORNET CT 532 HORNET COURT 532 HORNET COURT PENSACOLA PENSACOLA PENSACOLAPENSACOLA
101 MAGIC CIR 101 MAGIC CIR 101 MAGIC CIRCLE 101 MAGIC CIRCLE TAMPATAMPA TAMPA TAMPA
13
Data Processing ResultsData Processing Results
LAB Data with DOB 12/1-31/2010 LAB Data with DOB 12/1-31/2010 Unduplicated On OrigSpecID: Unduplicated On OrigSpecID: 9,211 rows9,211 rows
VS Data with DOB 11/1 – 12/31/2010 VS Data with DOB 11/1 – 12/31/2010 Unduplicated on State File Number: Unduplicated on State File Number: 37,741 rows37,741 rows
99% Unduplicated & Linked Records 99% Unduplicated & Linked Records with weighted score > 2.5with weighted score > 2.5
14
Overall Linkage ResultsOverall Linkage Results
98 – 99 % using back-end 98 – 99 % using back-end approachapproach
Still not good enoughStill not good enough Follow Rhode Island front-end Follow Rhode Island front-end
approachapproach
15
Advantages of Front-end Advantages of Front-end LinkageLinkage
Provide real-time linkage at hospital with Provide real-time linkage at hospital with VS Birth Date & NBS demographic dataVS Birth Date & NBS demographic data
Reduces data entry by hospital staffReduces data entry by hospital staff Provide daily report of unlinked/missing Provide daily report of unlinked/missing
recordsrecords Provide LAB w/checklist of incoming blood Provide LAB w/checklist of incoming blood
specimensspecimens Reduce follow-up by state staff to hospitalsReduce follow-up by state staff to hospitals Allow end-users (hospitals, MDs) ability to Allow end-users (hospitals, MDs) ability to
view electronic patient reports/results in view electronic patient reports/results in real-timereal-time
16
AcknowledgementsAcknowledgementsKen JonesKen JonesBureau Chief/Deputy State Bureau Chief/Deputy State
RegistrarRegistrarBureau of Vital StatisticsBureau of Vital Statistics
Sharon DoverSharon DoverOperations ManagerOperations ManagerBureau of Vital StatisticsBureau of Vital Statistics
Paula StewartPaula StewartDatabase AnalystDatabase AnalystHealth Statistics & AssessmentHealth Statistics & Assessment