40
CZ3253: Computer Aided Drug design CZ3253: Computer Aided Drug design Lecture 7: Drug Design Methods II: SVM Lecture 7: Drug Design Methods II: SVM Prof. Chen Yu Zong Prof. Chen Yu Zong Tel: 6874-6877 Tel: 6874-6877 Email: Email: [email protected] [email protected] http://xin.cz3.nus.edu.sg http://xin.cz3.nus.edu.sg Room 07-24, level 7, SOC1, Room 07-24, level 7, SOC1, National University of Singapore National University of Singapore

Classification of Drugs by SVM

  • Upload
    licia

  • View
    49

  • Download
    0

Embed Size (px)

DESCRIPTION

CZ3253: Computer Aided Drug design Lecture 7: Drug Design Methods II: SVM Prof. Chen Yu Zong Tel: 6874-6877 Email: [email protected] http://xin.cz3.nus.edu.sg Room 07-24, level 7, SOC1, National University of Singapore. Classification of Drugs by SVM. - PowerPoint PPT Presentation

Citation preview

Page 1: Classification of Drugs by SVM

CZ3253: Computer Aided Drug designCZ3253: Computer Aided Drug design

Lecture 7: Drug Design Methods II: SVM Lecture 7: Drug Design Methods II: SVM

Prof. Chen Yu ZongProf. Chen Yu Zong

Tel: 6874-6877Tel: 6874-6877Email: Email: [email protected]@nus.edu.sghttp://xin.cz3.nus.edu.sghttp://xin.cz3.nus.edu.sg

Room 07-24, level 7, SOC1, Room 07-24, level 7, SOC1, National University of SingaporeNational University of Singapore

Page 2: Classification of Drugs by SVM

22

Classification of Drugs by SVMClassification of Drugs by SVM

• A drug is classified as either belong (+) or not belong (-) to a class

Examples of drug class: inhibitor of a protein, BBB penetrating, genotoxicExamples of protein class: enzyme EC3.4 family, DNA-binding

• By screening against all classes, the property of a drug or the function of a protein can be identified

Drug

Class-1 SVM

Class-2 SVM

Class-3 SVM

Drugbelongs toFamily-3

-

-

+

--

Page 3: Classification of Drugs by SVM

33

Classification of Drugs or Proteins by SVMClassification of Drugs or Proteins by SVM

What is SVM?

• Support vector machines, a machine learning method, learning by examples, statistical learning, classify objects into one of the two classes.

Advantages of SVM:

• Diversity of class members (no racial discrimination).

• Use of structure-derived physico-chemical features as basis for drug classification (no structure-similarity required in the algorithm).

Page 4: Classification of Drugs by SVM

44

SVM ReferencesSVM References• C. Burges, "A tutorial on support vector machines for pattern recognition",

Data Mining and Knowledge Discovery, Kluwer Academic Publishers,1998 (on-line).

• R. Duda, P. Hart, and D. Stork, Pattern Classification, John-Wiley, 2nd edition, 2001 (section 5.11, hard-copy).

• S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Pres, 2001 (sections 3.6.2, 3.7.2, hard copy).

• Online lecture notes (http://www.cs.unr.edu/~bebis/MathMethods/SVM/lecture.pdf )

• Publications of SVM drug prediction: – J. Chem. Inf. Comput. Sci. 44,1630 (2004) – J. Chem. Inf. Comput. Sci. 44, 1497 (2004) – Toxicol. Sci. 79,170 (2004).

Page 5: Classification of Drugs by SVM

55

Machine Learning MethodMachine Learning Method Inductive learning:

Example-based learning

Descriptor

Positive examples

Negative examples

Page 6: Classification of Drugs by SVM

66

Machine Learning MethodMachine Learning Method

A=(1, 1, 1)B=(0, 1, 1)C=(1, 1, 1)D=(0, 1, 1)E=(0, 0, 0)F=(1, 0, 1)

Feature vectors: Descriptor

Feature vector

Positive examples

Negative examples

Page 7: Classification of Drugs by SVM

77

SVM MethodSVM Method Feature vectors in input space:

A=(1, 1, 1)B=(0, 1, 1)C=(1, 1, 1)D=(0, 1, 1)E=(0, 0, 0)F=(1, 0, 1)

Z

Input space

X

Y

BAE

F

Feature vector

Page 8: Classification of Drugs by SVM

88

SVM MethodSVM Method

BorderNew border

Project to a higher dimensional space

Protein familymembers

Nonmembers

Protein familymembers

Nonmembers

Page 9: Classification of Drugs by SVM

99

SVM methodSVM method

Support vector

Support vector

New border

Protein familymembers

Nonmembers

Page 10: Classification of Drugs by SVM

1010

SVM MethodSVM Method

Protein familymembers

Nonmembers

New border

Support vector

Support vector

Page 11: Classification of Drugs by SVM

1111

Best Linear Separator?Best Linear Separator?

Page 12: Classification of Drugs by SVM

1212

Best Linear Separator?Best Linear Separator?

Page 13: Classification of Drugs by SVM

1313

Find Closest Points in Convex Find Closest Points in Convex HullsHulls

c

d

Page 14: Classification of Drugs by SVM

1414

Plane Bisect Closest Points Plane Bisect Closest Points

x w b

w d c

d

c

Page 15: Classification of Drugs by SVM

1515

Find using quadratic programFind using quadratic program

21

2

1 1

1 1

min

1 1. .

0 1,...,

i i i ii i

i ii i

i

c d

c x d x

s t

i

Many existing and new solvers.

Page 16: Classification of Drugs by SVM

1616

Best Linear Separator:Best Linear Separator:Supporting Plane MethodSupporting Plane Method

1x w b

1x w b

Maximize distanceBetween two parallel supporting planes

Distance = “Margin” = ||||

2

w

Page 17: Classification of Drugs by SVM

1717

Best Linear Separator?Best Linear Separator?

Page 18: Classification of Drugs by SVM

1818

SVM MethodSVM Method

Border line is nonlinear

Page 19: Classification of Drugs by SVM

1919

SVM methodSVM method

Non-linear transformation: use of kernel function

Page 20: Classification of Drugs by SVM

2020

SVM methodSVM method

Non-linear transformation

Page 21: Classification of Drugs by SVM

2121

SVM MethodSVM Method

Page 22: Classification of Drugs by SVM

2222

SVM MethodSVM Method

Page 23: Classification of Drugs by SVM

2323

SVM MethodSVM Method

Page 24: Classification of Drugs by SVM

2424

SVM MethodSVM Method

Page 25: Classification of Drugs by SVM

2525

SVM for Classification of DrugsSVM for Classification of DrugsHow to represent a drug?

• Each structure represented by specific feature vector assembled from structural, physico-chemical properties:– Simple molecular properties (molecular weight, no. of rotatable bonds

etc. 18 in total)– Molecular Connectivity and shape (28 in total)– Electro-topological state polarity (84 in total)– Quantum chemical properties (electric charge, polaritability etc. 13 in

total)– Geometrical properties (molecular size vector, van der Waals volume,

molecular surface etc. 16 in total)

J. Chem. Inf. Comput. Sci. 44,1630 (2004) J. Chem. Inf. Comput. Sci. 44, 1497 (2004)

Toxicol. Sci. 79,170 (2004).

Page 26: Classification of Drugs by SVM

2626

-8 -7 -6 -5 -4 -3

-8

-7

-6

-5

-4

-3

Observed RT (min)

Pre

dict

ed R

T (

min

)SVM Feature SelectionSVM Feature Selection

CACO2 - 718 descriptorsCACO2 - 718 descriptorsAverage of 10 ModelsAverage of 10 Models

Test Q2 = .7073

Q2 is MSE scaled by variance:

= (mean square error) / (true variance)

Page 27: Classification of Drugs by SVM

2727

Feature SelectionFeature Selection

Using subset of descriptors might greatly improve results.

• Do feature selection using Linear SVM with 1-norm regularization

1-norm2-norm

Page 28: Classification of Drugs by SVM

2828

Feature Selection via Feature Selection via Sparse SVM/LPSparse SVM/LP

• Construct linear -SVM using 1-norm LP:

• Pick best C, for SVM• Keep descriptors

with nonzero coefficients

* 1*

, , , , 1

*

*

min

.

, , 0 1,.

|| ||

.,

i iw b z z i

i i i

i i i

i i

Cz z C

x b y zs

w

tx b y z

z z

w

w

i

| | 0iw

Page 29: Classification of Drugs by SVM

2929

Bagged Feature SelectionBagged Feature SelectionPartition Training Data

Training Set Validation Set

Linear SVM AlgorithmFor Feature Selection

A Linear Regression Model

Bag B Models and Obtain Subset of Features

Repeat B times

1 2 7181 2 718

i r

Make 20 models of the form

- ...

with only a few 0

Keep attributes with w w

r

i

w x b w x w x w x w r b

w

Random Variable - r

Page 30: Classification of Drugs by SVM

3030

-8 -7 -6 -5 -4 -3

-8

-7

-6

-5

-4

-3

Observed RT (min)

Pre

dict

ed R

T (

min

)

Bagged SVM (RBF)Bagged SVM (RBF)CACO2 - 31 DescriptorsCACO2 - 31 Descriptors

Test Q2 = .134

Page 31: Classification of Drugs by SVM

3131

Starplot Caco2 - 31 DescriptorsStarplot Caco2 - 31 Descriptors

 

ABSDRN6

a.don

KB54

SMR.VSA2

BNP8

DRNB10

KB11

PEOE.VSA.FPPOS

ANGLEB45

PIPB53

DRNB00

PEOE.VSA.4

SlogP.VSA6

apol

ABSFUKMIN

PIPB04

PEOE.VSA.FPOL

PIPMAX

BNPB50

BNPB21

PEOE.VSA.FHYD

PEOE.VSA.PPOS

EP2

SlogP.VSA9

ABSKMIN

PEOE.VSA.FNEG

BNPB31

FUKB14

pmiZ

SIKIA

SlogP.VSA0

Page 32: Classification of Drugs by SVM

3232

Chemistry In/Out ModelingChemistry In/Out Modeling

Feature Selection

Visualize Features

Assess Chemistry

Construct SVM Nonlinear model

Data +Descriptors

SVM Model

Test Data

Predict bioactivities

ChemistryInterpretation

Page 33: Classification of Drugs by SVM

3333

-8 -7 -6 -5 -4 -3

-8

-7

-6

-5

-4

-3

Observed RT (min)

Pre

dict

ed R

T (

min

)

Bagged SVM (RBF)Bagged SVM (RBF)CACO2 - 15 DescriptorsCACO2 - 15 Descriptors

Test Q2 = .166

Page 34: Classification of Drugs by SVM

3434

CACO2 – 15 Variables CACO2 – 15 Variables 

a.don

KB54

SMR.VSA2

ANGLEB45

DRNB10

ABSDRN6

PEOE.VSA.FPPOS

DRNB00

PEOE.VSA.FNEG

ABSKMIN

SIKIA

pmiZ

BNPB31

FUKB14

SlogP.VSA0

Page 35: Classification of Drugs by SVM

3535

Chemical InsightsChemical Insights

• Hydrophobicity  - a.don• SIZE and Shape ABSDRN6, SMR.VSA2,  ANGLEB45, PmiZ  Large is bad. Flat is bad. Globular is good.• Polarity – PEOE.VSA.FPPOS, PEOE.VSA.FNEG:

negative partial charge good.

Correspond to conventional wisdom – rule of 5.

Page 36: Classification of Drugs by SVM

3636

Hybrid TAE/SHAPEHybrid TAE/SHAPE

• Shape important overall factor– DRNB10, DRNB00: del rho dot N– BNP31: bare nuclear potential – KB54: kinetic energy descriptors very large lipophilic molecules don’t work– FUKB14: Fukui Surface

• Interpretations difficult• Point to chemistry challenges/hypotheses

Page 37: Classification of Drugs by SVM

3737

Final SVM ApproachFinal SVM Approach

• Construct large set of descriptors.• Perform feature selection:

– Sensitivity Analysis or SVM-LP

• Construct many SVM models– Optimize using QP or LP– Evaluate by Validation Set or Leave-one-out – Select best models by grid or pattern search

• Bag best k models to create final function

Page 38: Classification of Drugs by SVM

3838

Drug Discovery Results (LOO)Drug Discovery Results (LOO)

Data # Sampl

e

# Var.

Full

# Var.

FS (Avg)

Q2

Full

Q2

FS

Caco2 27 713 41 0.33 0.29

Barrier 62 569 51 0.31 0.28

HIV 64 561 17 0.46 0.40

Cancer 46 362 34 0.50 0.16LCCK 66 350 69 0.40 0.37

Aquasol 197 525 57 0.08 0.06

Page 39: Classification of Drugs by SVM

SVM-based drug design and property prediction softwareSVM-based drug design and property prediction softwareUseful for inhibitor/activator/substrate prediction, drug safety and pharmacokinetic prediction.

Computer loaded Computer loaded with SVMProtwith SVMProt

Support vector machinesSupport vector machinesclassifier for every classifier for every

Drug classDrug class

Identified Identified classesclasses

Drug designed Drug designed or property or property predicted predicted

Send structure to classifierSend structure to classifier

J. Chem. Inf. Comput. Sci. 44,1630 (2004) J. Chem. Inf. Comput. Sci. 44, 1497 (2004)

Toxicol. Sci. 79,170 (2004).

Input structurethrough internet

Option 2Option 1

Input structureon local machine

http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi

Your drug structure

Which class your Which class your drug belongs to?drug belongs to?

Drug

Chemical Structure Chemical

Structure

Page 40: Classification of Drugs by SVM

SVM Drug Prediction ResultsSVM Drug Prediction Results

Protein inhibitor/activator/substrate prediction:

• 86% of the 129 estrogen receptor activators and 84% of 101 non-activators correctly predicted.

• 81% of 116 P-glycoprotein substrates and 79% of 85 non-substrates correctly predicted

Drug Toxicity Prediction:

• 97% of 102 TdP+ and 84% of 243 TdP- agents correctly predicted • 73% of 229 genotoxic and 93% of 631 non-genotoxic agents correctly predicted

Pharmacokinetics prediction:

• 95% of 276 BBB+ and 82% of 139 BBB- agents correctly predicted• 90% of 131 human intestine absorption and 80% of 65 non-absoption agents

correctly predicted.

J. Chem. Inf. Comput. Sci. 44,1630 (2004) J. Chem. Inf. Comput. Sci. 44, 1497 (2004)

Toxicol. Sci. 79,170 (2004).