Upload
margaret-rodgers
View
213
Download
0
Embed Size (px)
Citation preview
1
Discovery of Temporal Patterns in Course-of-Disease Medical Data
Jorge C. G. RamirezPh.D. Candidate
Lynn L. Peterson and Diane J. CookSupervising Professors
3
Objective
• Discover patterns that represent groups of patients that had a similar course of disease for a catastrophic or chronic illness
• Motivation– Medical– AI
4
Contributions
• Data Preprocessing– Normalization– Learning Missing Data– Learning Implicit Knowledge
• Exploratory Analysis– Event Set Sequence Approach
5
Contributions
• Domain Understanding– New perspective on mass of data– Identify groups of patients for further medical
study
6
Approach
• Example Events– Laboratory Results 461 L WBC 2.70 461 L HCT 40.10 461 L PLT 239.00 461 L CD4% 19.00 461 L CD4A 188.00
7
Approach
• Example Events
468 C CV
468 D 043.9 AIDS-RELATED COMPLEX, UNSPECIFIED
469 P CTM 60 CO-TRIMOXAZOLE DS 469 P AZT 200 ZIDOVUDINE 100MG
• Example Events– Visits
– Diagnoses
– Pharmacy
8
• Event Set Sequences– Events
• Value Event: laboratory test result, visit
• Duration Event: pharmacy, diagnosis
– Event Set is all Events that occur in a window of time
– Event Set Sequence is all Event Sets that occur over a long period of time
Approach
• Event Set Sequences
9
Approach
• Example Event Set 461 L WBC 2.70 461 L HCT 40.10 461 L PLT 239.00 461 L CD4% 19.00 461 L CD4A 188.00 468 C CV 468 D 043.9 AIDS-RELATED COMPLEX, UNSPECIFIED 469 P CTM 60 CO-TRIMOXAZOLE DS 469 P AZT 200 ZIDOVUDINE 100MG
10
Approach
• Normalization– Normal for each patient is different– Especially when effected by a catastrophic or
chronic illness– Example: CD4A
• General Population Normal: 416 - 1751
• Well HIV-positive patient: 200 - 350
• Severely immune-compromised patient: 0 - 50
11
Approach
• Normalization (continued)– Scale to -4…0…+4
• 0 is normal
• Each number represents a deviation from normal
• 1 and 2 are noticeable but not severe
• 3 is severe
• 4 is very severe
12
Approach
• Replace Missing Data– Diagnosis data very incomplete– Learn severity of condition from pharmacy data– Induce decision tree to classify conditions
13
Approach
• Create Health Status Categories 1 = HIV-positive asymptomatic2 = Asymptomatic, on anti-HIV therapy3 = Immune-compromised, on prophylactic
therapy4 = Active illness5 = Severe active illness
14
Approach
• Learn Implicit Knowledge– Need to augment explicit knowledge – Recovery time is expert’s implicit knowledge– Use neural network to learn recovery time
function• 0 = Nothing to recover from
• 1-4 = weeks to recover
• 5 = 5 or more weeks to recover
15
Approach
• Categorize Pharmacy Data– A myriad of drugs prescribed– Need to understand significance– Categorize by use
16
Approach
• Categories– Nucleoside Analogs– Protease Inhibitors– Prophylaxis Therapies– Intraveneous antibiotics– Anti-virals– Anti-PCP/Toxoplasmosis– Anti-mycobacterials
18
Approach
• Result: Understandable representation of patient data
861 C 1.1 26.1 167 0.0 0 16 0862 0.0 0.0 0 0.0 0 0 2 24: 30 38: 50867 H 4.3 19.2 144 0.0 0 11 3 0: 3 22: 1 35: 2 868 H 2.2 26.2 144 0.0 0 5 3 0: 3 22: 1 35: 2869 0.0 0.0 0 0.0 0 0 1 35: 60874 C 1.3 32.4 0 0.0 0 17 0889 C 1.1 30.4 154 0.0 0 36 0890 0.0 0.0 0 0.0 0 0 3 22: 30 38: 50 39:480923 0.0 0.0 0 0.0 0 0 1 39:480933 H 3.6 20.4 182 0.0 0 11 3 0: 2 22: 1 39: 12
19
Approach
• Result: Understandable representation of patient data
861 C 3 1 -4 -3 0 -9 -9 –1 0 0 2 0 0 0 0 0 0 0867 H 4 4 0 -4 -1 -9 -9 –2 0 0 2 0 0 0 1 1 0 0868 H 4 1 -2 -3 -1 -9 -9 –4 0 0 2 0 0 0 1 1 0 0874 C 4 3 -4 -1 -9 -9 -9 0 0 0 2 0 0 0 1 1 0 0889 C 4 2 -4 -2 -1 -9 -9 2 0 0 2 0 0 0 1 1 0 0933 H 4 4 0 -4 0 -9 -9 –2 0 0 1 0 0 0 0 2 0 0
20
Approach
• Result: Understandable representation of patient data
< { (EV C)(HS 3)(RT 1)(WBC -4)(HCT -3)(PLT 0) (LMPH –1)(onD 0010000000) } { (EV H)(HS 4)(RT 4)(WBC 0)(HCT -4)(PLT -1) (LMPH –2)(onD 0010001100) } { (EV H)(HS 4)(RT 1)(WBC -2)(HCT -3)(PLT -1) (LMPH –4)(onD 0010001100) } { (EV C)(HS 4)(RT 3)(WBC -4)(HCT -1) (onD 00010001100) } { (EV C)(HS 4)(RT 2)(WBC -4)(HCT -2)(PLT -1) (LMPH 2)(onD 0010001100) } { (EV H)(HS 4)(RT 4)(WBC 0)(HCT -4)(PLT 0) (LMPH –2)(onD 0010000100) } >
21
Approach
• Inexact Match– Use set difference
• Partial match, feature by feature
• Assumes default partial match for missing data
– Use weakest-link/average-link• Require minimum degree of match
• Require average degree of match
25
• Validation– Results are temporal patterns that demonstrate
groups of patients had similar experience during the course of disease
– Only medical experts can assess validity of discovered patterns
– These results have been validated by the experts in the HIV Clinical Research Group
Results
26
Results
• Given a database of patients followed for 4 to 9 years– Discovered interesting patterns– Interestingness has multiple dimensions
• Length
• Data that appears in the patterns
• Data that does not appear in the patterns
27
Results
• Advanced patients, subject to various OIs < { (EV C)(HS 3)(RT 0)(WBC 0)(HCT -1)(PLT 0)(LMPH -3) (onD 0000000000) } { (EV E)(HS 3)(RT 2)(WBC 3)(HCT -1)(PLT 1)(LMPH 4) (onD 0000000000) } { (EV C)(HS 3)(RT 0)(WBC 1)(HCT 0)(PLT 0)(CD4P -3) (CD4A -1)(LMPH 0)(onD 1010000000) } { (EV C)(HS 3)(RT 1)(WBC -1)(HCT -1)(PLT 1)(LMPH 2) (onD 1010000000) } { (EV E)(HS 3)(RT 1)(WBC 2)(HCT -1)(PLT 1)(LMPH 4) (onD 0000000000) } { (EV C)(HS 3)(RT 1)(WBC 1)(HCT 0)(PLT 0)(CD4P -3) (CD4A -2)(LMPH 0)(onD 1010000000) } >
28
• Advanced patients, fairly stable < { (EV C)(HS 3)(RT 0)(WBC -1)(HCT -1)(PLT 1)(CD4P -4) (CD4A -4)(LMPH 0)(onD 0010000000) } { (EV C)(HS 3)(RT 0)(WBC 0)(HCT 0)(PLT -1)(CD4P -4) (CD4A -4)(LMPH 0)(onD 1010000000) } { (EV C)(HS 3)(RT 0)(onD 1010000000) } { (EV C)(HS 3)(RT 0)(WBC -2)(HCT 0)(PLT -1)(CD4P -4) (CD4A -4)(LMPH 0)(onD 0010000000) } { (EV C)(HS 4)(RT 1)(WBC 1)(HCT -4)(PLT 0)(CD4P -4) (CD4A -4)(LMPH -4)(onD 0011001000) } { (EV C)(HS 3)(RT 3)(onD 0010000000) } { (EV )(HS 3)(RT 1)(WBC 0)(HCT 0)(PLT 0)(LMPH 0) (onD 0000000000) } { (EV C)(HS 3)(RT 0)(CD4A -4)(onD 0010000000) } >
29
• Asymptomatic period < { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 1)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV E)(HS 1)(RT 0)(WBC -1)(HCT 0)(PLT 1)(CD4P -1) (CD4A -2)(LMPH 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(CD4A 0)(onD 0010000000) } { (EV C)(HS 1)(RT 0)(CD4A 0)(onD 0010000000) } { (EV E)(HS 1)(RT 0)(WBC 1)(HCT 0)(PLT 0)(CD4P 0) (CD4A 0)(LMPH 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } >
30
Summary
• Nine Steps of KDD– Identify goal– Identify target data set– Data cleaning and preprocessing– Data reduction and projection– Identify data mining method
31
Summary
• Nine Steps of KDD– Exploratory Analysis– Data Mining– Interpretation of Mined Patterns– Acting on Discovered Knowledge
32
Conclusions
• Objective Met with Contributions– Patterns discovered representing groups of
patients with similar experience in course of disease
– This perspective on the data has not previously been produced
– This kind of computation on this kind of data has not previously been produced
33
Future Work
• Improve discovery algorithm– Backtracking is a barrier to overcome
• Improve search control
• Develop heuristic for measuring interestingness
• Add ability to identify clinically identical/similar patterns