John Mitchell; James McDonagh; Neetika Nath

1

John Mitchell; James McDonagh; Neetika Nath

Rob Lowe; Richard Marchese Robinson

RF-Score: a Machine Learning Scoring Functionfor Protein-Ligand Binding Affinities

• Ballester, P.J. & Mitchell, J.B.O. (2010) Bioinformatics 26, 1169-1175

Calculating the affinities of protein-ligand complexes:

For docking

For post-processing docking hits

For virtual screening

For lead optimisation

For 3D QSAR

Within series of related complexes

For any general complex

Absolute (hard!)

Relative

A difficult, unsolved problem.

Three existing approaches …

1. Force fields


2. Empirical Functions


2. Empirical Functions


3. Knowledge based

How knowledge-based scoring functions have worked …

P-L complexes from PDBAssign atoms to typesFind histograms of type-type distancesConvert to an ‘energy’Add up the energies from all P-L atom pairs

This conversion of the histogram into an energy function uses a “reverse Boltzmann” methodology.

Thus it “assumes” that the atoms of protein and ligand are independent particles in equilibrium at temperature T.

For a variety of reasons, these are poor assumptions …

Molecular connectivity: atom-atom distances are miles from being independent.

Excluded volume effects.

No physical basis for assuming such an equilibrium.

Changes in structure with T are small and not like those implied by the Boltzmann distribution.

We thought about this …

… and wrote a paper saying

“It’s not true, but it sort of works”

We thought about this …

… and wrote a paper saying

“It’s not true, but it sort of works”

Then we had a better idea – could we dispense with the reverse Boltzmann formalism?

Instead of assuming a formula that relates the distance distribution to the binding free energy …

… use machine learning to learn the relationship from known structures and binding affinities.

Instead of assuming a formula that relates the distance distribution to the binding free energy …

… use machine learning to learn the relationship from known structures and binding affinities.

And persuade someone to pay for it!

Random Forest

Predicted binding affinity

Random Forest● Introduced by Briemann and Cutler (2001)● Development of Decision Trees (Recursive Partitioning):

● Dataset is partitioned into consecutively smaller subsets

● Each partition is based upon the value of one descriptor

● The descriptor used at each split is selected so as to optimise splitting

● Bootstrap sample of N objects chosen from the N available objects with replacement

The Random Forest is a just forest of randomly generated decision trees …

… whose outputs are averaged to give the final prediction

Building RF-Score

PDBbind 2007

Building RF-Score

PDBbind 2007

Validation results: PDBbind set

Following method of Cheng et al. JCIM 49, 1079 (2009) Independent test set PDBbind core 2007, 195 complexes from 65 clusters

Validation results: PDBbind set

RF-Score outperforms competitor scoring functions, at least on our test RF-Score is available for free from our group website

26

John Mitchell; James McDonagh; Neetika Nath

Rob Lowe; Richard Marchese Robinson

Documents

John Mitchell; James McDonagh; Neetika Nath