BUDE: GPU-Accelerated Molecular Docking for Drug Discovery · 2014-04-08 · The Bristol University...

Preview:

Citation preview

BUDE

A General Purpose Molecular Docking Program Using OpenCL

Richard B Sessions

1

The molecular docking problem

2

Proteins typically O(1000) atoms

Ligands typically O(100) atoms ligand

predicted complex

receptor

1 Sampling (6-degrees of freedom) EMC

2 Binding affinity prediction EFE-FF

3

An atom-atom based forcefield

parameterised according to atom type,

analagous to standard molecular mechanics

McIntosh-Smith, S., et al., Benchmarking Energy Efficiency,

Power Costs and Carbon Emissions on Heterogeneous Systems.

Computer Journal, 2012. 55(2): p. 192-205.

Empirical Free Energy Forcefield

soft core

Re-docking a ligand into the Xray Structure (good prediction == low RMSD)

5 1CIL (Human carbonic anhydrase II) RMSD ~ 0.2 Å

Another example

6 1EZQ (Human Factor XA) RMSD ~ 1.2 Å

7

Accuracy of Pose Prediction (re-docking the BindingDB validation set, 84 complexes)

www.bindingdb.org

8

Binding Energy Prediction:

is BUDE any better?

Mike Hann’s

2006 test

of docking

software

Yes – better but not perfect!

9

Yes

Yes

No

No

Yes Large

Small

Info

Docking

Yes

No

No

Start BUDE Enter Initial

Data End BUDE

Act on

Option

Print Help

Write Control File

End BUDE

Data

Reading

Error(s)?

Error(s)?

Prepare Data for

Docking

Yes

Error(s)?

No

Docking

Type

Surface Docking Site Docking

Generate Surface

Pairs

Do Docking

Parallel

Code?

Do Generation

Host Job

Accelerated Job

Calculate Energies

Rank Energies

EMC

Last

Generation

Score Results

Print Results

BUDE Simplified Flow Diagram

(C++/OpenCL)

BUDE’s heterogeneous approach

1. Discover all OpenCL platforms/devices,

inc. both CPUs and GPUs

2. Run a micro benchmark on each device,

ideally a short piece of real work

3. Load balance using micro benchmark

results

4. Re-run micro benchmark at regular

intervals in case load changes

10

11

BUDE’s Three Docking Modes

•Virtual Screening by Docking

• Binding Site Prediction

• Protein-Protein Docking in real space

12

Virtual Screening by Docking

Virtual Screening by Docking of NDM-1 New Delhi metallo-β-lactamase-1

• 8 million ZINC8 candidate drug molecules

20 conformers each 160M dockings

• EMERALD (STFC funded machine in Oxford)

• 372 GPU

• 2.4x1017 atom-atom energies calculated

• ~60 hours actual wall-time

13

BUDE’s EMC in Action 14

15

Virtual Screening for Ligands to Stabilise a Protein

Screened 160 million conformations of the 8 million ZINC database

against 5 different conformations of the protein on EMERALD

Selected and tested 58 compounds with two types of experimental

assays and found 18 compounds binding between 10 and 100 µM

31% hit rate

16

A New Virtual Screen against a key protein from the

Malaria Parasite

BlueCrystal P3 76 Nvidia K20s

EMERALD 372 Nvidia M2090s

Binding Site Identification

Full rotation and limited

translation of the ligand at

each receptor surface vector

18

Location of the Binding Site of PI3P to a Protein

(homology model) Involved in Insulin Signalling

Thomas & Tavare

19

Protein-Protein Docking (in real space)

Each point on ligand offered to

each point on receptor with a local

mini-dock: complete rotation in Z ,

rock in X & Y, small translations in

X, Y & Z

20

Best energy -> RMSD = 0.2 Å

Protein-Protein Docking Example the leucine zipper coiled coil

In a “real” case with Pete Cullen’s group we have mapped a protein-

protein interface using BUDE and confirmed it experimentally. This

took only 20 site-directed mutations, instead of the hundreds required

by full alanine-scanning mutagenesis

21

Performance across devices

High performance in silico virtual drug screening on many-core processors.

Simon McIntosh-Smith, James Price, Richard B. Sessions & Amaurys A. Ibarra

International Journal of High Performance Computing Applications (accepted for publication)

16 cores @ 3.1 GHz

22

Main Optimisations

Conditional accumulation Predicated accumulation

Instruction mix in the innermost loop of the energy calculation

High performance in silico virtual drug screening on many-core processors.

Simon McIntosh-Smith, James Price, Richard B. Sessions & Amaurys A. Ibarra

International Journal of High Performance Computing Applications (accepted for publication)

23

Optimisations

High performance in silico virtual drug screening on many-core processors.

Simon McIntosh-Smith, James Price, Richard B. Sessions & Amaurys A. Ibarra

International Journal of High Performance Computing Applications (accepted for publication)

24

Summary

• GPUs and machines like Emerald are enabling new science

• BUDE is promising a step-change in Molecular Docking

• But plenty more developments and improvements are possible!

25

Amaurys Avila Ibarra

Simon N McIntosh-Smith

James Price

Debbie K Shoemark

EMERALD and the eInfraStructure South Consortium UK

BlueCrystal and the Advanced Computing Research Centre

(Bristol)

Acknowledgements On the shoulders of giants ...

Emil Fischer (1852-1919)

‘Lock and Key’

Willard Gibbs (1839-1903)

Gibbs Free Energy

G = H – TS

26

Supplementary Slides

Structure and Binding Energy Prediction speed vs accuracy tradeoff

Entropy:

solvation

configurational

Electrostatics

All atom

Explicit solvent

No Yes Yes

Approx Approx Yes

? Approx Yes

No Yes Yes

No No Yes

Typical docking

scoring

functions

Empirical Free

Energy Forcefield

BUDE

Free Energy

calculations

MM1,2 QM/MM3

Accuracy

Speed

1. MD Tyka, AR Clarke, RB Sessions, J. Phys. Chem. B 110 17212-20 (2006)

2. MD Tyka, RB Sessions, AR Clarke, J. Phys. Chem. B 111 9571-80 (2007)

3. CJ Woods, FR Manby, AJ Mulholland, J. Chem. Phys. 128 014109 (2008) 27

EMC Genetic Algorithm minimiser

28

On the shoulders of giants ...

Emil Fischer (1852-1919)

‘Lock and Key’

Willard Gibbs (1839-1903)

Gibbs Free Energy

G = H – TS

30

Receptor and Ligand Flexibility

Protein: Backbone – dock to selected Xray or MD structures

Sidechains – sample side chain rotamers during docking

Full flexibility: would be Molecular Dynamics

Small molecule: generate and dock many different conformations

Limited flexibility: is appropriate for Molecular Docking:

e.g. ZINC database of 8 M drug-like compounds 160 M conformers

Seed Parents Selected By Flag Generation Size Output

Descriptors

Output

Coordinates

Mutation

Method

Parameter Parameter Parameter

N M R* True X Y Z U K% R* R*

EMC Genetic Algorithm

32

BUDE Algorithm

33

Recommended