13
An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity Majid Masso School of Systems Biology, George Mason University Manassas, Virginia 20110, USA CSBW – BIBM 2012, Philadelphia, Pennsylvania

An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity

  • Upload
    jaxon

  • View
    42

  • Download
    0

Embed Size (px)

DESCRIPTION

An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity. Majid Masso School of Systems Biology, George Mason University Manassas, Virginia 20110, USA CSBW – BIBM 2012, Philadelphia, Pennsylvania. Knowledge-Based Potentials of Mean Force. - PowerPoint PPT Presentation

Citation preview

Page 1: An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity

An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity

Majid Masso

School of Systems Biology, George Mason University

Manassas, Virginia 20110, USA

CSBW – BIBM 2012, Philadelphia, Pennsylvania

Page 2: An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity

Knowledge-Based Potentials of Mean Force

• Generated via statistical analysis of observed features in a diverse training set of structures selected from the PDB

• Alternative to physics or molecular mechanics energy functions

• Assumption: observed features follow a Boltzmann distribution

• Examples:

– Well-documented in the literature: distance-dependent pairwise interactions at the atomic or amino acid level

– This study: inclusion of higher-order contributions by developing an all-atom four-body statistical potential

• Motivation (our prior work):

– Four-body protein potential at the amino acid level

Page 3: An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity

Motivational Example:Pairwise Amino Acid Potential

• The 20-letter protein alphabet yields 210 residue pairs

• Obtain a diverse PDB training set of single protein chains; represent each protein as a set of amino acid points in 3D

• For each residue pair (i, j), calculate the relative frequency fij with which they appear within a given distance (e.g., 12 angstroms) of each other in all the protein structures

• Calculate a rate pij expected by chance alone by using a background or reference distribution (more later…)

• Apply inverted Bolzmann principle: sij = log(fij / pij) quantifies interaction propensity and is proportional to the energy of interaction (by a factor of ‘–RT’) for the pair

Page 4: An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity

All-Atom Four-Body Statistical Potential

• Diverse PDB training set of 1417 single chain and multimeric proteins, many complexed to ligands (see paper for text file)

• Six-letter atomic alphabet: C, N, O, S, M (metals), X (other)• Apply Delaunay tessellation to the atomic point coordinates of

each PDB file – objectively identifies all nearest-neighbor quadruplets of atoms in the structure (8 angstrom cutoff)

Page 5: An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity

All-Atom Four-Body Statistical Potential

• The six-letter atomic alphabet yields 126 distinct quadruplets• Calculate observed rate fijkl of quad (i, j, k, l) occurrence among all tetrahedra from the 1417 structure tessellations• Compute rate pijkl expected by chance from a multinomial reference distribution:

• an = proportion of atoms from all structures that are of type n

• tn = number of occurrences of atom type n in the quad

.4 and 1 where,

!

!4 6

1

6

1

6

16

1

nn

nn

n

tn

nn

ijkl taa

t

p n

Page 6: An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity

Summary Data for the 1417 Structure Files and their Delaunay Tessellations

Atom Types Count Proportion

C 3,612,988 0.633193 N 969,253 0.169866 O 1,088,410 0.190749 S 28,502 0.004995

(all metals) M 2,529 0.000443 (all other non-metals) X 4,299 0.000754

Total atom count: 5,705,981

Total tetrahedron count: 34,504,737

Page 7: An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity

All-Atom Four-Body Statistical PotentialQuad Count sijkl Quad Count sijkl Quad Count sijkl Quad Count sijkl CCCC 4015872 -0.14024 CMOX 68 0.308765 MMNX 0 -- NNOS 5637 -0.30523 CCCM 1592 -0.98922 CMSS 2047 2.848813 MMOO 306 2.31553 NNOX 302 -0.75477 CCCN 4025206 -0.16987 CMSX 13 1.172117 MMOS 104 3.127729 NNSS 311 0.319427 CCCO 6202159 -0.03247 CMXX 6 1.958862 MMOX 3 2.409325 NNSX 6 -0.87471 CCCS 293157 0.224008 CNNN 102035 -0.62305 MMSS 254 5.398477 NNXX 5 0.168652 CCCX 2796 -0.97505 CNNO 1995038 0.140679 MMSX 2 3.815151 NOOO 171147 0.021937 CCMM 132 0.908235 CNNS 15892 -0.37618 MMXX 0 -- NOOS 10697 -0.07737 CCMN 3318 -0.57598 CNNX 578 -0.99392 MNNN 1030 0.53596 NOOX 3102 0.206513 CCMO 5325 -0.42089 CNOO 2734639 0.227273 MNNO 1128 0.047955 NOSS 922 0.440012 CCMS 2293 0.795108 CNOS 95438 0.050981 MNNS 561 1.326526 NOSX 12 -0.92506 CCMX 15 -0.5677 CNOX 2168 -0.77117 MNNX 5 0.098041 NOXX 61 0.903627 CCNN 1797552 -0.12464 CNSS 4264 0.584024 MNOO 3744 0.518626 NSSS 33 1.052833 CCNO 8233136 0.184864 CNSX 37 -0.95711 MNOS 314 0.723107 NSSX 0 -- CCNS 124653 -0.05308 CNXX 61 0.382553 MNOX 29 0.510083 NSXX 0 -- CCNX 2007 -1.02473 COOO 524994 -0.06271 MNSS 793 3.008398 NXXX 3 2.475964 CCOO 3366568 0.047161 COOS 34429 -0.14114 MNSX 5 1.328573 OOOO 34212 -0.12555 CCOS 198630 0.098905 COOX 23801 0.520038 MNXX 9 2.706383 OOOS 4240 -0.0525 CCOX 4626 -0.71243 COSS 4380 0.545326 MOOO 5430 1.106856 OOOX 9553 1.121777 CCSS 15288 0.868158 COSX 58 -0.81224 MOOS 156 0.669977 OOSS 300 0.203077 CCSX 144 -0.63735 COXX 65 0.359781 MOOX 168 1.523669 OOSX 36 -0.19726 CCXX 143 0.482159 CSSS 285 1.417735 MOSS 210 2.380989 OOXX 128 1.476181 CMMM 23 3.480397 CSSX 5 0.006247 MOSX 4 1.181307 OSSS 38 1.063748 CMMN 144 1.216422 CSXX 4 0.730845 MOXX 55 3.442148 OSSX 3 0.305472 CMMO 256 1.415945 CXXX 9 2.381656 MSSS 62 3.910199 OSXX 0 -- CMMS 662 3.41048 MMMM 83 7.794725 MSSX 2 2.763224 OXXX 2 2.249518 CMMX 1 1.41113 MMMN 37 4.258301 MSXX 0 -- SSSS 6 2.446092 CMNN 2474 -0.13203 MMMO 29 4.102142 MXXX 16 5.786451 SSSX 0 -- CMNO 6267 -0.07975 MMMS 379 6.8003 NNNN 3878 -0.8697 SSXX 0 -- CMNS 2588 1.118068 MMMX 0 -- NNNO 46665 -0.44173 SXXX 0 -- CMNX 26 -0.05842 MMNN 83 1.849597 NNNS 460 -0.86605 XXXX 0 -- CMOO 8481 0.302308 MMNO 102 1.587734 NNNX 34 -1.17582 CMOS 1010 0.659069 MMNS 363 3.720958 NNOO 340620 0.195102

Page 8: An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity

Topological Score (TS)

• Delaunay tessellation of any macromolecular structure yields an aggregate of tetrahedral simplices

• Each simplex can be scored using the all-atom four-body potential based on the quad present at the four vertices

• Topological score (or ‘total potential’) of the structure: sum the scores of all constituent simplices in tessellation

sijkl TS = Σsijkl

Page 9: An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity

Topological Score Difference (ΔTS)

Page 10: An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity

Application of ΔTS: Predicting Protein – Ligand Binding Energy

• MOAD – repository of exp. dissociation constants (kd) for protein–ligand complexes whose structures are in PDB

• Collected kd values for 300 complexes reflecting diverse protein structures

• Obtained exp. binding energy from kd via ΔGexp = –RTln(kd)

• Calculated ΔTS for complexes

Page 11: An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity

Predicting Protein – Ligand Binding Energy

• Randomly selected 200 complexes to train a model

• Correlation coefficient r = 0.79 between ΔTS and ΔGexp

• Empirical linear transformation of ΔTS to reflect energy values:

ΔGcalc = L (ΔTS)

• Linear => same r = 0.79 value between ΔGcalc and ΔGexp

• Also, standard error of SE = 1.98 kcal/mol and fitted regression line of y = 1.18x (y = ΔGcalc and x = ΔGexp)

Page 12: An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity

Predicting Protein – Ligand Binding Energy

• For the test set of 100 remaining complexes:

• r = 0.79 between ΔGcalc and ΔGexp

• SE = 1.93 kcal/mol

• Fitted regression line is y = 1.11x – 0.63

• All training/test data is available online as a text file (see paper)

Page 13: An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity

References and Acknowledgments

• PDB (structure DB): http://www.rcsb.org/pdb

• MOAD (ligand binding DB): http://bindingmoad.org/

• Qhull (Delaunay tessellation): http://www.qhull.org/

• UCSF Chimera (ribbon/ball-stick structure visualization): http://www.cgl.ucsf.edu/chimera/

• Matlab (tessellation visualization): http://www.mathworks.com/products/matlab/