34
1 Discovery of Temporal Patterns in Course-of- Disease Medical Data Jorge C. G. Ramirez Ph.D. Candidate Lynn L. Peterson and Diane J. Cook Supervising Professors

1 Discovery of Temporal Patterns in Course-of-Disease Medical Data Jorge C. G. Ramirez Ph.D. Candidate Lynn L. Peterson and Diane J. Cook Supervising Professors

Embed Size (px)

Citation preview

1

Discovery of Temporal Patterns in Course-of-Disease Medical Data

Jorge C. G. RamirezPh.D. Candidate

Lynn L. Peterson and Diane J. CookSupervising Professors

2

Overview

• Objective

• Contributions

• Approach

• TEMPADIS

• Summary and Conclusions

3

Objective

• Discover patterns that represent groups of patients that had a similar course of disease for a catastrophic or chronic illness

• Motivation– Medical– AI

4

Contributions

• Data Preprocessing– Normalization– Learning Missing Data– Learning Implicit Knowledge

• Exploratory Analysis– Event Set Sequence Approach

5

Contributions

• Domain Understanding– New perspective on mass of data– Identify groups of patients for further medical

study

6

Approach

• Example Events– Laboratory Results 461 L WBC 2.70 461 L HCT 40.10 461 L PLT 239.00 461 L CD4% 19.00 461 L CD4A 188.00

7

Approach

• Example Events

468 C CV

468 D 043.9 AIDS-RELATED COMPLEX, UNSPECIFIED

469 P CTM 60 CO-TRIMOXAZOLE DS 469 P AZT 200 ZIDOVUDINE 100MG

• Example Events– Visits

– Diagnoses

– Pharmacy

8

• Event Set Sequences– Events

• Value Event: laboratory test result, visit

• Duration Event: pharmacy, diagnosis

– Event Set is all Events that occur in a window of time

– Event Set Sequence is all Event Sets that occur over a long period of time

Approach

• Event Set Sequences

9

Approach

• Example Event Set 461 L WBC 2.70 461 L HCT 40.10 461 L PLT 239.00 461 L CD4% 19.00 461 L CD4A 188.00 468 C CV 468 D 043.9 AIDS-RELATED COMPLEX, UNSPECIFIED 469 P CTM 60 CO-TRIMOXAZOLE DS 469 P AZT 200 ZIDOVUDINE 100MG

10

Approach

• Normalization– Normal for each patient is different– Especially when effected by a catastrophic or

chronic illness– Example: CD4A

• General Population Normal: 416 - 1751

• Well HIV-positive patient: 200 - 350

• Severely immune-compromised patient: 0 - 50

11

Approach

• Normalization (continued)– Scale to -4…0…+4

• 0 is normal

• Each number represents a deviation from normal

• 1 and 2 are noticeable but not severe

• 3 is severe

• 4 is very severe

12

Approach

• Replace Missing Data– Diagnosis data very incomplete– Learn severity of condition from pharmacy data– Induce decision tree to classify conditions

13

Approach

• Create Health Status Categories 1 = HIV-positive asymptomatic2 = Asymptomatic, on anti-HIV therapy3 = Immune-compromised, on prophylactic

therapy4 = Active illness5 = Severe active illness

14

Approach

• Learn Implicit Knowledge– Need to augment explicit knowledge – Recovery time is expert’s implicit knowledge– Use neural network to learn recovery time

function• 0 = Nothing to recover from

• 1-4 = weeks to recover

• 5 = 5 or more weeks to recover

15

Approach

• Categorize Pharmacy Data– A myriad of drugs prescribed– Need to understand significance– Categorize by use

16

Approach

• Categories– Nucleoside Analogs– Protease Inhibitors– Prophylaxis Therapies– Intraveneous antibiotics– Anti-virals– Anti-PCP/Toxoplasmosis– Anti-mycobacterials

17

Approach

• Categories (continued)– Anti-wasting syndrome– Anti-fungals– Chemotherapies

18

Approach

• Result: Understandable representation of patient data

861 C 1.1 26.1 167 0.0 0 16 0862 0.0 0.0 0 0.0 0 0 2 24: 30 38: 50867 H 4.3 19.2 144 0.0 0 11 3 0: 3 22: 1 35: 2 868 H 2.2 26.2 144 0.0 0 5 3 0: 3 22: 1 35: 2869 0.0 0.0 0 0.0 0 0 1 35: 60874 C 1.3 32.4 0 0.0 0 17 0889 C 1.1 30.4 154 0.0 0 36 0890 0.0 0.0 0 0.0 0 0 3 22: 30 38: 50 39:480923 0.0 0.0 0 0.0 0 0 1 39:480933 H 3.6 20.4 182 0.0 0 11 3 0: 2 22: 1 39: 12

19

Approach

• Result: Understandable representation of patient data

861 C 3 1 -4 -3 0 -9 -9 –1 0 0 2 0 0 0 0 0 0 0867 H 4 4 0 -4 -1 -9 -9 –2 0 0 2 0 0 0 1 1 0 0868 H 4 1 -2 -3 -1 -9 -9 –4 0 0 2 0 0 0 1 1 0 0874 C 4 3 -4 -1 -9 -9 -9 0 0 0 2 0 0 0 1 1 0 0889 C 4 2 -4 -2 -1 -9 -9 2 0 0 2 0 0 0 1 1 0 0933 H 4 4 0 -4 0 -9 -9 –2 0 0 1 0 0 0 0 2 0 0

20

Approach

• Result: Understandable representation of patient data

< { (EV C)(HS 3)(RT 1)(WBC -4)(HCT -3)(PLT 0) (LMPH –1)(onD 0010000000) } { (EV H)(HS 4)(RT 4)(WBC 0)(HCT -4)(PLT -1) (LMPH –2)(onD 0010001100) } { (EV H)(HS 4)(RT 1)(WBC -2)(HCT -3)(PLT -1) (LMPH –4)(onD 0010001100) } { (EV C)(HS 4)(RT 3)(WBC -4)(HCT -1) (onD 00010001100) } { (EV C)(HS 4)(RT 2)(WBC -4)(HCT -2)(PLT -1) (LMPH 2)(onD 0010001100) } { (EV H)(HS 4)(RT 4)(WBC 0)(HCT -4)(PLT 0) (LMPH –2)(onD 0010000100) } >

21

Approach

• Inexact Match– Use set difference

• Partial match, feature by feature

• Assumes default partial match for missing data

– Use weakest-link/average-link• Require minimum degree of match

• Require average degree of match

22

TEMPADIS

Raw Target Data

Data Cleaning

Data Normalization

Normalized Database

23

TEMPADIS

Normalized Database

Decision Tree

Neural Net

Reduced, Knowledge-Added Data

24

TEMPADIS

Knowledge-Added Database

Sequence Builder

Temporal Patterns

25

• Validation– Results are temporal patterns that demonstrate

groups of patients had similar experience during the course of disease

– Only medical experts can assess validity of discovered patterns

– These results have been validated by the experts in the HIV Clinical Research Group

Results

26

Results

• Given a database of patients followed for 4 to 9 years– Discovered interesting patterns– Interestingness has multiple dimensions

• Length

• Data that appears in the patterns

• Data that does not appear in the patterns

27

Results

• Advanced patients, subject to various OIs < { (EV C)(HS 3)(RT 0)(WBC 0)(HCT -1)(PLT 0)(LMPH -3) (onD 0000000000) } { (EV E)(HS 3)(RT 2)(WBC 3)(HCT -1)(PLT 1)(LMPH 4) (onD 0000000000) } { (EV C)(HS 3)(RT 0)(WBC 1)(HCT 0)(PLT 0)(CD4P -3) (CD4A -1)(LMPH 0)(onD 1010000000) } { (EV C)(HS 3)(RT 1)(WBC -1)(HCT -1)(PLT 1)(LMPH 2) (onD 1010000000) } { (EV E)(HS 3)(RT 1)(WBC 2)(HCT -1)(PLT 1)(LMPH 4) (onD 0000000000) } { (EV C)(HS 3)(RT 1)(WBC 1)(HCT 0)(PLT 0)(CD4P -3) (CD4A -2)(LMPH 0)(onD 1010000000) } >

28

• Advanced patients, fairly stable < { (EV C)(HS 3)(RT 0)(WBC -1)(HCT -1)(PLT 1)(CD4P -4) (CD4A -4)(LMPH 0)(onD 0010000000) } { (EV C)(HS 3)(RT 0)(WBC 0)(HCT 0)(PLT -1)(CD4P -4) (CD4A -4)(LMPH 0)(onD 1010000000) } { (EV C)(HS 3)(RT 0)(onD 1010000000) } { (EV C)(HS 3)(RT 0)(WBC -2)(HCT 0)(PLT -1)(CD4P -4) (CD4A -4)(LMPH 0)(onD 0010000000) } { (EV C)(HS 4)(RT 1)(WBC 1)(HCT -4)(PLT 0)(CD4P -4) (CD4A -4)(LMPH -4)(onD 0011001000) } { (EV C)(HS 3)(RT 3)(onD 0010000000) } { (EV )(HS 3)(RT 1)(WBC 0)(HCT 0)(PLT 0)(LMPH 0) (onD 0000000000) } { (EV C)(HS 3)(RT 0)(CD4A -4)(onD 0010000000) } >

29

• Asymptomatic period < { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 1)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV E)(HS 1)(RT 0)(WBC -1)(HCT 0)(PLT 1)(CD4P -1) (CD4A -2)(LMPH 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(CD4A 0)(onD 0010000000) } { (EV C)(HS 1)(RT 0)(CD4A 0)(onD 0010000000) } { (EV E)(HS 1)(RT 0)(WBC 1)(HCT 0)(PLT 0)(CD4P 0) (CD4A 0)(LMPH 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } >

30

Summary

• Nine Steps of KDD– Identify goal– Identify target data set– Data cleaning and preprocessing– Data reduction and projection– Identify data mining method

31

Summary

• Nine Steps of KDD– Exploratory Analysis– Data Mining– Interpretation of Mined Patterns– Acting on Discovered Knowledge

32

Conclusions

• Objective Met with Contributions– Patterns discovered representing groups of

patients with similar experience in course of disease

– This perspective on the data has not previously been produced

– This kind of computation on this kind of data has not previously been produced

33

Future Work

• Improve discovery algorithm– Backtracking is a barrier to overcome

• Improve search control

• Develop heuristic for measuring interestingness

• Add ability to identify clinically identical/similar patterns

34

Future Work

• Move database to new Intelligent Systems in Medicine and Biology Lab

• Bring database up to date

• Include more domain data in Event Sets

• Explore impact of new developments in HIV treatment