26
1 John Mitchell; James McDonagh; Neetika Nath Rob Lowe; Richard Marchese Robinson

John Mitchell; James McDonagh ; Neetika Nath

  • Upload
    chava

  • View
    61

  • Download
    0

Embed Size (px)

DESCRIPTION

John Mitchell; James McDonagh ; Neetika Nath. Rob Lowe; Richard Marchese Robinson . RF-Score: a Machine Learning Scoring Function for Protein-Ligand Binding Affinities . Ballester, P.J. & Mitchell, J.B.O. (2010) Bioinformatics 26, 1169-1175 . - PowerPoint PPT Presentation

Citation preview

Page 1: John Mitchell; James  McDonagh ;  Neetika Nath

1

John Mitchell; James McDonagh; Neetika Nath

Rob Lowe; Richard Marchese Robinson

Page 2: John Mitchell; James  McDonagh ;  Neetika Nath

RF-Score: a Machine Learning Scoring Functionfor Protein-Ligand Binding Affinities

• Ballester, P.J. & Mitchell, J.B.O. (2010) Bioinformatics 26, 1169-1175

Page 3: John Mitchell; James  McDonagh ;  Neetika Nath
Page 4: John Mitchell; James  McDonagh ;  Neetika Nath

Calculating the affinities of protein-ligand complexes:

· For docking

· For post-processing docking hits

· For virtual screening

· For lead optimisation

· For 3D QSAR

· Within series of related complexes

· For any general complex

· Absolute (hard!)

· Relative

A difficult, unsolved problem.

Page 5: John Mitchell; James  McDonagh ;  Neetika Nath

Three existing approaches …

1. Force fields

Page 6: John Mitchell; James  McDonagh ;  Neetika Nath

Three existing approaches …

2. Empirical Functions

Page 7: John Mitchell; James  McDonagh ;  Neetika Nath

Three existing approaches …

2. Empirical Functions

Page 8: John Mitchell; James  McDonagh ;  Neetika Nath

Three existing approaches …

3. Knowledge based

Page 9: John Mitchell; James  McDonagh ;  Neetika Nath

How knowledge-based scoring functions have worked …

· P-L complexes from PDB· Assign atoms to types· Find histograms of type-type distances· Convert to an ‘energy’· Add up the energies from all P-L atom pairs

Page 10: John Mitchell; James  McDonagh ;  Neetika Nath
Page 11: John Mitchell; James  McDonagh ;  Neetika Nath

2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 80

200

400

600

800

1000

1200

Nitrogen-Oxygen Distance Dis-tribution

DIstance/ Angstroms

Num

ber o

bser

ved

Page 12: John Mitchell; James  McDonagh ;  Neetika Nath

· This conversion of the histogram into an energy function uses a “reverse Boltzmann” methodology.

· Thus it “assumes” that the atoms of protein and ligand are independent particles in equilibrium at temperature T.

· For a variety of reasons, these are poor assumptions …

Page 13: John Mitchell; James  McDonagh ;  Neetika Nath

· Molecular connectivity: atom-atom distances are miles from being independent.

· Excluded volume effects.

· No physical basis for assuming such an equilibrium.

· Changes in structure with T are small and not like those implied by the Boltzmann distribution.

Page 14: John Mitchell; James  McDonagh ;  Neetika Nath

We thought about this …

… and wrote a paper saying

“It’s not true, but it sort of works”

Page 15: John Mitchell; James  McDonagh ;  Neetika Nath

We thought about this …

… and wrote a paper saying

“It’s not true, but it sort of works”

Page 16: John Mitchell; James  McDonagh ;  Neetika Nath

Then we had a better idea – could we dispense with the reverse Boltzmann formalism?

Page 17: John Mitchell; James  McDonagh ;  Neetika Nath

· Instead of assuming a formula that relates the distance distribution to the binding free energy …

… use machine learning to learn the relationship from known structures and binding affinities.

Page 18: John Mitchell; James  McDonagh ;  Neetika Nath

· Instead of assuming a formula that relates the distance distribution to the binding free energy …

… use machine learning to learn the relationship from known structures and binding affinities.

· And persuade someone to pay for it!

Page 19: John Mitchell; James  McDonagh ;  Neetika Nath

2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 80

200400600800

10001200

Nitrogen-Oxygen Distance Distribution

DIstance/ Angstroms

Num

ber o

bser

ved

Random Forest

Predicted binding affinity

Page 20: John Mitchell; James  McDonagh ;  Neetika Nath

Random Forest● Introduced by Briemann and Cutler (2001)● Development of Decision Trees (Recursive Partitioning):

● Dataset is partitioned into consecutively smaller subsets

● Each partition is based upon the value of one descriptor

● The descriptor used at each split is selected so as to optimise splitting

● Bootstrap sample of N objects chosen from the N available objects with replacement

Page 21: John Mitchell; James  McDonagh ;  Neetika Nath

· The Random Forest is a just forest of randomly generated decision trees …

… whose outputs are averaged to give the final prediction

Page 22: John Mitchell; James  McDonagh ;  Neetika Nath

Building RF-Score

PDBbind 2007

Page 23: John Mitchell; James  McDonagh ;  Neetika Nath

Building RF-Score

PDBbind 2007

Page 24: John Mitchell; James  McDonagh ;  Neetika Nath

Validation results: PDBbind set

· Following method of Cheng et al. JCIM 49, 1079 (2009)· Independent test set PDBbind core 2007, 195 complexes from 65 clusters

Page 25: John Mitchell; James  McDonagh ;  Neetika Nath

Validation results: PDBbind set

· RF-Score outperforms competitor scoring functions, at least on our test· RF-Score is available for free from our group website

Page 26: John Mitchell; James  McDonagh ;  Neetika Nath

26

John Mitchell; James McDonagh; Neetika Nath

Rob Lowe; Richard Marchese Robinson