23
CISC667, F05, Lec23, Liao 1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 1

CISC 667 Intro to Bioinformatics(Fall 2005)

Support Vector Machines (II)

Bioinformatics Applications

Page 2: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 2

Page 3: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 3

Page 4: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 4

Page 5: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 5

Combining pairwise similarity with SVMs for protein homology detection

Protein homologs

Protein non-homologs

Positivepairwise score

vectors

Negativepairwise score

vectors

Support vector machine

Binary classification

Target protein of unknown function

1

23

Positive train Negative train

Testing data

Page 6: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 6

Experiment: known protein families

Jaakkola, Diekhans and Haussler 1999

Page 7: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 7

Vectorization

Page 8: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 8

Page 9: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 9

A measure of sensitivity and specificity

ROC = 1

ROC = 0

ROC = 0.67

6

5

ROC: receiver operating characteristic score is the normalized area

under a curve the plots true positives as a function of false positives

Page 10: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 10

Performance Comparison (1)

Page 11: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 11

Page 12: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 12

Using Phylogenetic Profiles & SVMs YAL001C

E-value Phylogenetic profile

0.122 1

1.064 0

3.589 0

0.008 1

0.692 1

8.49 0

14.79 0

0.584 1

1.567 0

0.324 1

0.002 1

3.456 0

2.135 0

0.142 1

0.001 1

0.112 1

1.274 0

0.234 1

4.562 0

3.934 0

0.489 1

0.002 1

2.421 0

0.112 1

Page 13: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 13

phylogenetic profiles and Evolution Patterns1

1

1 1

10

0

1 1 0 1 0 0 0 1 1 0x

Impossible to know for sure if the gene followed exactly this

evolution pattern

Page 14: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 14

Tree Kernel (Vert, 2002) For a phylogenetic profile x and an evolution pattern e:• P(e) quantifies how “natural” the pattern is

• P(x|e) quantifies how likely the pattern e is the “true history” of the profile x

Tree Kernel :

K tree(x,y) = Σe p(e)p(x|e)p(y|e) Can be proved to be a kernel Intuition: two profiles get closer in the feature space when

they have shared common evolution patterns with high probability.

Page 15: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 15

1 1 0 1 0 0 0 1 1

10.33

0.67

0.34

0.5

0.75

0.55

1 0.33 0.67 0.34 0.5 0.75 0.55

Post-order traversal

Tree-Encoded Profile (Narra & Liao, 2004)

Page 16: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 16

Page 17: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 17

Using Support Vector Machines

Page 18: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 18

Kernel function:

where r = 0.10

Soft margin regularization C = 1.50

Coding scheme: BIN21

L() = i ½ i j yi yj (K(xi · xj) + ij /C)

Evaluation:

Q3 = (P1+P2+P3)/N

C = (TPTN - FP FN) / ( PP PN AP AN)

SOV: segment overlap accuracy

Page 19: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 19

Design tertiary classifiers

Page 20: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 20

Page 21: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 21

Nguyen & Rajapakse, Genome Informatics 14: 218-227 (2003)

Page 22: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 22

A two-stage SVM

Page 23: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

CISC667, F05, Lec23, Liao 23