28
Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT) her Nwosisi 1,2 , Sung-Hyuk Cha 1 , Yoo Jung An, Charles C. Tappert 1 , Evan 1 Computer Science Department Pace University New York, USA 2 Vascular Laboratory Montefiore Medical Center New York, USA

Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

  • Upload
    martha

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT). Christopher Nwosisi 1,2 , Sung-Hyuk Cha 1 , Yoo Jung An, Charles C. Tappert 1 , Evan Lipsitz 2. 1 Computer Science Department Pace University New York, USA. 2 Vascular Laboratory Montefiore Medical Center - PowerPoint PPT Presentation

Citation preview

Page 1: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

Christopher Nwosisi1,2, Sung-Hyuk Cha1, Yoo Jung An, Charles C. Tappert1, Evan Lipsitz2

1Computer Science Department Pace UniversityNew York, USA

2Vascular LaboratoryMontefiore Medical CenterNew York, USA

Page 2: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

Statement of Problem

• The use of decision tree algorithms such as ID3 and C4.5 in medical diagnostic application today is promising, but often suffer from excessive complexity and can even be incomprehensible.

• Especially in predicting DVTs which have high mortality, simple and accurate decision model is preferred for potential patients, Medical Technologists and Physicians before sending patients for expensive medical examinations.

Page 3: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

Proposed approach

• Using the Genetic Algorithm to minimize the complexity (size) and/or maximize the accuracy of the decision tree.

• New approach found shorter and/or more accurate decision trees than ones produced by conventional the ID3 and C4.5 algorithms.

Page 4: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

DVT / VTE

Silent PESilent PE1 Million1 Million

DeathDeath60,00060,000

Estimated Cost of VTE Care $1.5 Billion/year

Magnitude of the Problem

Post-thrombotic Post-thrombotic SyndromeSyndrome

800,000800,000

Pulmonary Pulmonary HypertensionHypertension

30,00030,000

Goldhaber SZ, et al. Lancet 1999;353:1386-19.

DVTDVT2 Million2 Million

PEPE600,000600,000

Page 5: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

Patients with deep vein thrombosis have a painful swollen leg which limits their mobility

Clinical Problem

Montefiore Hospital Vascular Laboratory, 2008

Page 6: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

DVT-Duplex Evaluation

Criteria for positive diagnosis:

- incompressibility of a venous segment

- visualization of thrombus

absence of flow

v a

Montefiore Hospital Vascular Laboratory

Page 7: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

Database Overview

Two datasets are extracted from two databases:

• Medical History

• Physical Exam

• Diagnostic Tests

• 515 records from the Laboratory

- 350 patients are positive for DVT- 165 patients are negative for DVT

• 620 records from the general registry

- 420 patients are positive for DVT- 200 patients are negative for DVT

Page 8: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

Table 1- Databases Attributes

No. Name Description

1 Sex1 = male; 0 = female

2 AgeAge in years {1- 99}

3 Diabetes0 = normal; 1 = Patient is receiving some treatment

4 Smoking0 = never smoked; 1 = Patient is an active Smoker;

2 = Patient stopped smoking

5 Surgery0 = never had surgery;

1 = Patient who had previous surgery

6 Pain0 = no pain in the leg;

1 = Patient experienced pain in the leg {Right, Left or Bilateral}

7 Swelling0 = no swelling below the knee;

1 = swelling in the leg

DVT0 = examination result indicate negative for DVT;

1 = examination result indicate positive for DVT

Medical History

Page 9: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

Table 2 – Database AttributesNo. Name Description

1 Sex 1 = male; 0 = female 12 Congestive heart

failure

0 = never diagnosed; 1 = previously diagnosed

2 Age Age in years {1-99} 13 Obesity 0 = obesity not specified; 1 = obesity specified

3 Diabetes 0 = normal; 1 = Patient is receiving some treatment 14 Accident 0 = never had a fall; 1 = previously had a fall

4 Smoking 0 = never smoked; 1 = Patient is an active Smoker;

2 = Patient stopped smoking

15 Hyperlipidemia 0 = normal; 1 = Patient is diagnosed

5 Surgery 0 = never had surgery; 1 = Patient who had previous

surgery

16 Cardiac

Dysrthythmia

0 = normal; 1 = Patient is diagnosed

6 Swelling 0 = no swelling below the knee; 1 = swelling in the leg 17 Lymphoproliferat

disease

0 = normal; 1 = Patient is diagnosed

7 Chest Pain 0 = none; 1 = pain in Chest DVT 0 = examination result indicate negative for DVT

1 = examination result indicate positive for DVT

8 Cancer 0 = normal; 1 = positive for cancer

9 Cellulitis 0 = normal; 1 = positive for cellulitis

10 Injury 0 = no injury; 1 = previous and current injuries

11 Pulmonary

embolism

0 = never diagnosed; 1 = previously diagnosed

Medical History

Physical ExamDiagnostic Tests

Page 10: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

Sex Age Diabetes Smoking surgery pain swelling DVT

M 77 y no y n n yes

M 53 n no y n n yes

M 55 n yes n n y yes

F 73 n no y n y yes

F 84 y no y n n yes

F 68 n yes y n n yes

F 81 n no y n n yes

M 84 y yes n n n yes

F 84 y no y n n yes

M 84 n no y n n yes

F 73 n no y n y yes

F 56 n no n n y yes

M 63 n no n n n yes

F 76 y no y n n yes

F 70 y no y n n yes

M 75 n no y n n yes

F 92 n no n n n no

F 73 n no y y n no

F 61 n stopped n y n no

M 63 y stopped y n n no

M 78 n no y n n no

F 96 n no y n n no

F 71 n no y n n no

M 71 n no n n y no

Table 2.1.1.1 - DVT sample data set IIDVT database (Table 1)

Page 11: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

AGE SEX Ob Sm Swell CHF Canc Surg Chest Lip Lymp Card DB Othr ACC/ Leg leg DVT                Pain     Dysr   PE Fall Inj Cell  50 M 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1

82 F 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1

88 F 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1

67 F 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1

83 F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

79 M 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1

54 M 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1

69 M 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1

68 M 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1

62 M 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1

26 F 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1

64 F 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 1

80 F 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1

82 F 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1

78 M 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1

33 F 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1

26 M 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1

54 M 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1

45 F 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1

47 F 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1

74 F 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1

60 F 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

58 M 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1

42 F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

63 M 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1

45 F 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 1

30 F 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

87 F 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0

77 F 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

97 F 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

88 F 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0

18 M 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

85 F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

35 M 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

68 F 0 0 0 0 1 0 1 1 0 0 1 0 0 0 0 0

48 F 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0

85 M 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0

68 M 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

42 F 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

DVT database (Table II)

Page 12: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

GNCP

PN

SMSBSS

CRCL

IJ

PEHFOB

ACLP

CDLD

A60DB

SR

SW

Dataset I Dataset II

Datasets Relationship

Page 13: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

Preprocessing (Binarization)

Heterogeneous type attributes

Sex Smoking … pain DVT

M no N yes

F no L yes

F yes Bi yes

F no N yes

M yes N yes

F stopped R no

M no N no

Homogeneous Binary type attributes

Original table Binary tableSex Smoking … pain DVT

1 0 0 0 0 1

0 0 0 1 0 1

0 1 1 1 1 1

0 0 0 0 0 1

1 1 1 0 0 1

0 1 0 0 1 0

1 0 0 0 0 0

Page 14: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

Why Binary Attribute?

• Applying GA on Non-binary attributes is extremelydifficult and currently an open problem

• To use the GA to build a binary decision tree, theattribute types must be in binary

Page 15: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

Age Distributions (numeric)

Page 16: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

Nominal type attributes (|v| > 2)

Leg Pain {L, R, Bi, N}

L P RP

vSmoking {N, Stopped, Yes}

SB SS

v

1 1 Bi 1 1 Smoking

1 0 L 1 0 Stopped

0 1 R 0 0 None

0 0 None

Page 17: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

A60 GN DB SM SR PN SW DVT1 1 0 0 0 0 0 10 0 0 0 0 0 1 11 0 0 0 1 0 0 11 1 0 0 1 0 0 10 1 0 0 1 0 0 1

0 1 0 0 1 0 1 0

0 0 0 0 0 0 0 01 0 0 0 0 0 0 00 0 0 0 0 0 0 01 1 1 0 0 0 0 0

Dataset I Binarized Table

Page 18: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

A60 GN DB OB SM SR SW HF CR CP HL LD CD PE AC IJ CL DVT

0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1

1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1

1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1

1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1

0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1

1 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Dataset II Binarized Table

Page 19: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

Decision Tree

Their representation of acquired knowledge in tree form is intuitive and generally easy to assimilate by humans.In general, DT classifiers have comparable accuracy to other complex classifiers but simple to understand and visualize.

SR

HFPE

CRSW pos(17/25)

(12/13)(11/12)

(10/10)pos

pospos negneg

1

1

1

1

10 0

00

0

Page 20: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

• Decision trees classify instances – by sorting them down from the root to the leaf node, – which provides the classification of the instance.

• Each internal node in the tree specifies a test of some attribute of the instance.

• Each leaf node assigns a classification

• Each branch descending from that node corresponds to one of the possible values of this attribute.

Decision Tree RepresentationDecision Tree Representation

Page 21: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

Decision Trees from Dataset I

SR

PN

SB

pn

n

DB

A6

pn

(b) 61.5% by GA

n

SB

GN

DB

PN

SS

SW

p

np

nn

np

(a) 59.5% by C4.5

pn pnpnpn

n

np

SWSRDBA6DB

DBA6GNn

GNPN

SS

(c) 64.5% by GA

Page 22: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

PE

SW

p

p

CL

HFSR

n pSMCR

a6

CR

p

n

ACAC

GN

DB

npp

p

p n

DB

LD

HL

GN

SM

CP

AC

CPHF

a6

p

npp

p n

a6

HF

n

p

p

p

p

p n

n

(a) C4.5 (72.25%)depth = 12

0

1

2

3

4

5

6

7

8

9

10

11

12

SR

CR

n

SM

CR

HF

AC

pnp

a6

n

n

p

a6

n

p

n

p

CL

DB

a6n

DB

DB

pn

OB

CD

SWa6

pn

HF

IJ

GNHFGN

n

SM

HL

GN a6

n

np np

np OB

np

SW HF

SM a6a6n

pp nnpn

(c) 73.75% by GA

HF

n

SW

pn

DB

DB

CLa6

pn

SRCR

n CD

np

HL

SR

pn

SM

GNa6

pn np

a6

GNAC

np

n

a6

GN

np

(b) 69.75% by GAdepth = 5

n

0

1

2

3

4

5

6

7

SR

a6

HF

GN

SM

CR

n

n

n

HL

p

n

p

a6

n

pGN

p

n

LD a6

AC

n p

IJ

a6 HL

nn

CD

p

SMnDB

p

OB

n

n

p

a6

(d) 75.25% by GAdepth = 7

CD

p

DB

n

nHF

CL

HF

n

p

DB

DB

pn

OB

GN

p

np

SWa6

pn

HFSWCR

AC

pn

a6 CR

PN a6GN GN

pnpn pnnp

a6

n GN

HF

np

Decision Trees from Dataset II – Figure 5

Page 23: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

The Best Measure of Efficiency (shortness) for a DT

• Average number of questions required to obtain a prediction.

Other measures:

• the depth of the tree• the number of nodes in the tree

Page 24: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

Depth limit

Performancerate

The average # of question

5 69.75 2.95256 73.75 3.37257 75.25 3.89558 76.50 4.32759 76.75 4.8225

10 78.00 5.122511 78.50 5.467512 79.50 5.867513 80.25 6.3075

Complexity of Decision Trees

12 72.25 7.485

16 80.0

C4.5

ID3

GA

Page 25: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

From both a depth and average-number of questions perspective the complexity of the

decision tree in Figure 5 (d) can be considered much more efficient (simpler)than the decision

tree from the C4.5 algorithm (Figure 5a).

0

1

2

3

4

5

6

7

SR

a6

HF

GN

SM

CR

n

n

n

HL

p

n

p

a6

n

pGN

p

n

LD a6

AC

n p

IJ

a6 HL

nn

CD

p

SMnDB

p

OB

n

n

p

a6

(d) 75.25% by GAdepth = 7

CD

p

DB

n

nHF

CL

HF

n

p

DB

DB

pn

OB

GN

p

np

SWa6

pn

HFSWCR

AC

pn

a6 CR

PN a6GN GN

pnpn pnnp

a6

n GN

HF

np

PE

SW

p

p

CL

HFSR

n pSMCR

a6

CR

p

n

ACAC

GN

DB

npp

p

p n

DB

LD

HL

GN

SM

CP

AC

CPHF

a6

p

npp

p n

a6

HF

n

p

p

p

p

p n

n

(a) C4.5 (72.25%)depth = 12

0

1

2

3

4

5

6

7

8

9

10

11

12

Page 26: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

SR

HFPE

A6

CR

SW

SW

DB

HF

CR

LP

pos(17/25)

(12/13)

(30/43)

(20/22)

(6/8)

(13/16)

(11/12)

(10/10)

(56/79)

SR

(43/52)

pos

pospos

pos

posnegpos

posneg posneg

pos

Optimal DT

This might be the optimal decision tree based on the data and indicates that combining human knowledge and machine speed of processing can often produce a superior result than either the human or machine could produce separately.

Page 27: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

Conclusion

• Experimental results on two datasets suggest that more accurate and efficient decision trees can be found by the GA

• The decision trees produced by the GA have significant clinical relevance.

• The results shown here increase the probability of predicting whether a patient would develop or have had DVT, which provides advancement in the diagnosis of DVT

Page 28: Constructing Binary Decision Tree for Predicting Deep Venous Thrombosis (DVT)

Future Works

The decision trees found by using GA tend to be almost full binary trees i.e., the width is large while the depthis short.

For future work, the C4.5 pruning mechanism could be applied to decision trees produced by GA to make trees sparse and to further avoid the potential over-fittingproblem.