EMRDatamining
1 0 -11 0 -1
1 0 -11 0 -1
1 0 -1
thousands
millions
hundreds
engineered
TheDataandtheproblem
ßIndividu
als(N)à
ß Features(p)à
ProceduresDevicesDiseasesDrugs Sequence,Expression(gene,protein),Metabolites…
Claims Quantifiedself
Wedoalot,withoutknowingwhatworks.
1. Clinicalquestions2. Insightsfromthedata3. Predictivemodels
Understandingthequestion
Whatproportionofthepopulationhasfamilialhyperlipidemia?
Whatarethesubtypesofautism?
Istherearelationbetweengutbacteriaanddepression?
Whichpatientswillbehigh-costnextyear?
Ismitralvalverepairorreplacementbetterforpatientswithprolapse?
HowdoesandrogendeprivationincreasedementiaRisk?
5
90773 water*on*the*brain3107785 hydrocephalus*[disease/finding]42612 hydrocephaly2889104 hydrocephalus*(disorder)1936514 hydrocephalus,*unspecified2001177 hydrocephalus*nos
2933 hydrocephalus*************************
4010253C0020255
hydrocephalus
tid cui str Note+freq syn Medline+freq %+noun2933 C0020255 hydrocephalus 29,634 NNS 19,541 64.6142612 C0020255 hydrocephaly 113 NN 275 49.8190773 C0020255 water>on>the>brain 8 ROOT 1 50
Phenotyping
Clinicalstudies
Terms,Concepts
Counts
Variables
Anyallergymed?
Profilingriskfactorsforchronicuveitis PAST MEDICAL/SURGICAL HISTORY: Positive for atrial fibrillation. The patient had AVR 6 years ago. Peripheral arterial disease with hypertension, peripheral neuropathy, atherosclerosis, hemorrhoids, proctitis, CABG, and cholecystectomy.
FAMILY HISTORY: Positive for atherosclerosis, hypertension, autoimmune diseases in the family.
REVIEW OF SYSTEMS: Weight loss of 25 pounds within the last 6 months, shortness of breath, constipation, bleeding from hemorrhoids, increased frequency of urination, muscle aches, dizziness and faintness, focal weakness and numbness in both legs, knees and feet.
LABORATORY DATA AND RADIOLOGICAL RESULTS: The patient had a chest x-ray, which showed cardiomegaly with atherosclerotic heart disease, pleural thickening and small pleural effusion, a left costophrenic angle which has not changed when compared to prior examination, COPD pattern. The patient also had a head CT, which showed atrophy with old ischemic changes. No acute intracranial findings.
DISCHARGE DIAGNOSIS: Syncope.
DISCHARGE MEDICATIONS: The patient was discharged on the following medications; Cardizem 90 mg p.o. thrice daily, digoxin 0.125 mg p.o. once daily, allopurinol 100 mg two times daily, Coumadin 4 mg p.o. q.h.s., and Remeron 15 mg p.o. q.h.s.
past medical surgical history: positive for atrial fibrillation. patient avr years ago. peripheral arterial disease hypertension, peripheral neuropathy, atherosclerosis, hemorrhoids, proctitis, cabg, and cholecystectomy.
family history: positive for atherosclerosis, hypertension, autoimmune diseases family.
review of systems: weight loss pounds months, shortness of breath, constipation, bleeding hemorrhoids, increased frequency of urination, muscle aches, dizziness and faintness, focal weakness and numbness both legs, knees and feet.
laboratory data and results: patient chest x-ray, which cardiomegaly atherosclerotic heart disease, pleural thickening and small pleural effusion, left costophrenic angle which not changed compared prior examination, copd pattern. patient head ct, which atrophy old ischemic . no acute intracranial findings.
discharge diagnosis: syncope.
discharge medications: patient following medications; cardizem mg . . daily, digoxin 0.125 mg . . once daily, allopurinol 100 mg two times daily, coumadin mg . . . . ., and remeron mg . . . . .
Input Text Annotations
True Internal Representation(with some keys shown for illustration)
53 844 58539 83 16 (colon)996 31363 13 (atrial fibrillation)1 19 (period)8 75087 12129 6158 61 316091 3 (peripheral arterial disease)254 332 124624 22 21 (comma)6198 2 (atherosclerosis)2 15 (comma)2835 22 1110647 2 (proctitis)2 92026 2 (cabg)2 411 21907 4 (cholecystectomy)1 15 (period)...
Reconstructed Representation
PAST MEDICAL/SURGICAL HISTORY: Positive for atrial fibrillation. The patient had AVR 6 years ago. Peripheral arterial disease with hypertension, peripheral neuropathy, atherosclerosis, hemorrhoids, proctitis, CABG, and cholecystectomy.
FAMILY HISTORY: Positive for atherosclerosis, hypertension, autoimmune diseases in the family.
REVIEW OF SYSTEMS: Weight loss of 25 pounds within the last 6 months, shortness of breath, constipation, bleeding from hemorrhoids, increased frequency of urination, muscle aches, dizziness and faintness, focal weakness and numbness in both legs, knees and feet.
LABORATORY DATA AND RADIOLOGICAL RESULTS: The patient had a chest x-ray, which showed cardiomegaly with atherosclerotic heart disease, pleural thickening and small pleural effusion, a left costophrenic angle which has not changed when compared to prior examination, COPD pattern. The patient also had a head CT, which showed atrophy with old ischemic changes. No acute intracranial findings.
DISCHARGE DIAGNOSIS: Syncope.
DISCHARGE MEDICATIONS: The patient was discharged on the following medications; Cardizem 90 mg p.o. thrice daily, digoxin 0.125 mg p.o. once daily, allopurinol 100 mg two times daily, Coumadin 4 mg p.o. q.h.s., and Remeron 15 mg p.o. q.h.s.
past medical surgical history: positive for atrial fibrillation. patient avr years ago. peripheral arterial disease hypertension, peripheral neuropathy, atherosclerosis, hemorrhoids, proctitis, cabg, and cholecystectomy.
family history: positive for atherosclerosis, hypertension, autoimmune diseases family.
review of systems: weight loss pounds months, shortness of breath, constipation, bleeding hemorrhoids, increased frequency of urination, muscle aches, dizziness and faintness, focal weakness and numbness both legs, knees and feet.
laboratory data and results: patient chest x-ray, which cardiomegaly atherosclerotic heart disease, pleural thickening and small pleural effusion, left costophrenic angle which not changed compared prior examination, copd pattern. patient head ct, which atrophy old ischemic . no acute intracranial findings.
discharge diagnosis: syncope.
discharge medications: patient following medications; cardizem mg . . daily, digoxin 0.125 mg . . once daily, allopurinol 100 mg two times daily, coumadin mg . . . . ., and remeron mg . . . . .
Input Text Annotations
True Internal Representation(with some keys shown for illustration)
53 844 58539 83 16 (colon)996 31363 13 (atrial fibrillation)1 19 (period)8 75087 12129 6158 61 316091 3 (peripheral arterial disease)254 332 124624 22 21 (comma)6198 2 (atherosclerosis)2 15 (comma)2835 22 1110647 2 (proctitis)2 92026 2 (cabg)2 411 21907 4 (cholecystectomy)1 15 (period)...
Reconstructed Representation
1
2
Countpresent,positivementions,aboutthepatient3
T-1 T-2 .. .. .. .. .. .. .. .. .. T-iP-1 1 0 -1P-2::::
P-n
1.8million
~3million
Deferbindingtoconcepts
Unitex
NegEx andConTextPatterns
Knowledgegraph
tf df NN JJ … VP T-1 T-2 T-3
ID Term-1 150,879 90,000 0.90 0.05 … 0.03
ID :
ID Term-nSyntactictypesFrequency Semantictypes
termmentions
Event/Outcomeconcepts
JuvenileIdiopathicArthritis
:
Uveitits
Iridocylitis
OralAntihistamines
JuvenileIdiopathicArthritisICD 9 codes696.0, 714.0, 714.2, 714.3, 714.9, 720.2, 720.9
Terms:Juvenile idiopathic arthritis, JIAJuvenile rheumatoid arthritis, JRAPsoriatic arthritisJuvenile spondyloarthropathy,spondyloarthritis,enthesitis related arthritis,sacroiliitis,reactive arthritis
Terms,Concepts
Counts
Variables
Androgendeprivation&Alzheimer’srisk
0 1 2 3 4 5 6 7
Anaphylaxis
Rhabdomyolysis
Tuberculosis
Allergic> rhinitis>
Abdominal>aortic>aneurysm>
Age
Caucasian
Ever> smoker
AntiDplatelet>medication>use
AntiDcoagulant>medication>use
AntiDhypertensive>medication>use
Statin>use
History>of>cardiovascular>disease
History>of>diabetes
History>of>malignancy
ADT>use
1.06 (1.04-1.08)
http://tinyurl.com/inf-consult
antihypertensives
Intervention
Diastolicpressure<90mmHg
Outcome
A55yearoldfemaleofVietnameseheritagewithknownasthmapresentstoherphysicianwithnewonsetmoderatehypertension
MyPatient
100
DiastolicBPwithDrugA:245DiastolicBPwithDrugB:989
1 2 3 4 5 6
90
80
70
60
MmHg
months
VariablesassociatedwithOutcome
DrugA
Asthma
Ethnicity
HDL
3 40 1 2HbA1c>10%
Example– 1:Choosingdiabetesdrugs
Scenario:WhichsecondlinedrugtousefortreatingdiabeticswhohavehighHbA1conetotwomonthsafterfirstlinetreatment?
250millionpatients
Metf->Metf (n)=377Glip->Glip (n)=30
Efficacy of monotherapy (Stanford data only)
Effectivetreatmentpathways
8.0Metforminà Glipizidevs.Pioglitazone
MetforminàSitagliptin vs.Pioglitazone
Metforminà Sitagliptin vs.Glipizide
Example– 2:Choosingchemotherapy
Scenario:For55-60yearoldwhitemalepatientwithnewlydiagnosedplasmacellleukemia(PCL),whatisthedifferenceinoverallsurvivalbetweenpatientstreatedwithintensiveversuslessintensivechemotherapy?
Example– 2:Personalizedestimate
Outlineofaninformaticsconsult
Descriptivesummary• Whathappenedaftertreatment?
Makingrecommendations• Whattreatmentchoicesaretypicallymade,givenpriormedicalhistory?Whataretypicaloutcomes?
• Estimation:WhatistheeffectoftreatmentchoiceXonoutcomeY?
XPRESS- EXtraction of PhenotypesfromclinicalRecordsusingSilver Standards
!!Time!"!
ICD-9 RX Terms Labs
3. Patient history is found after first mention of keyword
1. Build Keyword list based on terms related to “Myocardial
infarction”
Pid Notes
524
1234
765
834
…
….
…
…
…
…
…
…
….
Annotations
“Myocardial infarction” (Y/N) ?
2. Find Patients with keyword mentions +" #"
T" TP# FN#F" FP# TN#
train
test
5. Classifier is built using 5-fold cross validation
4. Feature Vectors constructed after collapsing patient timeline into normalized frequency counts for all terms, ICD-9s,
prescriptions and labs
ICD9 RX Terms Labs
4. Feature Vectors constructed after collapsing patient timeline into normalized frequency counts for all terms, ICD-9s,
prescriptions and labs
Input:config.R – withtermsearchsettingsOutput:keywords.tsv andignore.tsv
Input:getPatients.R -- config.R,keywords.tsv,ignore.tsvOutput:feature_vectors.Rda
Input:buildModel.R -- config.R,feature_vectors.RdaOutput:model.Rda
Phenotype AUC Sens. Spec. PPV
DM 0.95 91% 83 % 83%
MI 0.91 89% 91% 91%
FH 0.90 76.5% 93.6% ~20%
Celiac 0.75 40 % 90% ~4 %
Phenotyping- effortprecisiontradeoff
Acc PPV Time
0.98 0.96 1900
Acc PPV Time
0.90 0.91 2hrInsights
Learningtruerangesoflabtests DETECT:DataminingEMRsToEvaluateCoincidentTesting
Qualitymetrics
• PostsurgeryDVTrates
• UrinaryIncontinenceafterprostatecancersurgery
Time between surgery and DVT/PE event (first year after surgery)
Time in 10 day intervalsFr
eque
ncy
0 20 50 80 110 140 170 200 230 260 290 320 350
025
5075
100
PredictiveModeling
ClassificationorPrediction?
Patient’sMedicalRecord
###.##
Timeof“X”
###.#####.#####.##
Riskprediction Classification
SourceoffeaturesChoiceoffeatures
Predictingdelayedhealingwounds
59,958 patients
• Age• Gender• Insurance• Zip code
180,716 wounds• Wound type• Wound location
• Dimensions• Edema• Erythema• Rubor• Other wound qualities
Each wound assessment:68 Healogics Wound Care Centers in 26states• Center Code
Clinicallyusableperformance
●
●
●
●
●
●
●
●
●
●
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00Predicted probability of delayed wound healing
Obs
erve
d fra
ctio
n of
del
ayed
wou
nd h
ealin
g
PredictingCostBlooms
Panel2013-14TotalMembers:149,038
Panel2014-15TotalMembers:274,525
Learnmodel
Applymodel
PredictingCostBlooms
N=274,525Expensivein
2014=$18,028
Expensivein2015=$20,489
60%- Bloomers
40%- Persistent
Predicted,butdon’tbloom
Bloomerswemiss
CostBloommodel
WholePopulation– HighCostmodel
Correctlypredictedbloomers
0.810.430.57
AUCPPVCostCap.
0.860.510.68
Fruitfulareasofactivity
• Riskstratification:cost,latentdisease,decompensation
• Personalizingevidence:riskforme,whattreatmentwillwork
• Insightsintodiseaseprogression• Usingpassivelycollecteddata:sensors,featureengineering
• Practicemanagement:predictingmissedappointments,medicationadherence...
Openresearchproblems
• Datanonstationarity• Localvs.Globalmodels• Handlingunstructureddata(text,images,timetraces)• Outcomeascertainment(andcensoring)• Evaluation:Beyonddiscrimination• calibration,net-reclassification
• Bridgingthe“lastmile”
AcknowledgementsGroupMembers:• Scientists: SuzanneTamang,KenJung,Alison
Callahan,JuanBanda• Fellows:RainerWinnenburg,Rohit Vashisht• Engineer: VladimirPolony• BMIStudents:SarahPoole,AlejandroSchuler,
AlbeeLing,Vibhu Agarwal,DavidStark• MedStudents:GregGaskin,Jassi Pannu
Alums:AnnaBauer-Mehren (Roche),Srini Iyer(Facebook),Amogh Vasekar (Citrix),SandyHuang(Berkeley),Paea LePendu (Lexigram),RaveHarpaz(Oracle),TylerCole(BarrowInst.),SamFinlayson(Harvard),WillChen(Yale),YenLow(Netflix),ElsieGyang (FellowshipinSurgery)
Funding:• NIH – NLM,NIGMS,NHGRI,NINDS,NCI,FDA• StanfordInternal– Dept.ofMedicine,
PopulationHealthSciences,ClinicalExcellenceResearchCenter,Dean’soffice
• Fellowships – MedScholars,SiebelScholarsFoundation,StanfordGraduateFellowship
• Industry – Apixio,CollabRx,Healogics,JanssenR&D,Oracle,Baidu USA,Amgen
IT: AlexSkrenchuk,SCCIteam