41
Chemical space network topology through atom typing N. Sukumar Michael Krein

Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Chemical space network topology through atom typing

N. Sukumar Michael Krein

Page 2: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Outline

•  Motivation •  Background •  Enabling Technology •  Preliminary Results •  Future Directions

Page 3: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein
Page 4: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

The figure depicts a cartoon representation of the relationship between the continuum of chemical space (light blue) and the discrete areas of chemical space that are occupied by compounds with specific affinity for biological molecules. Examples of such molecules are those from major gene families (shown in brown, with specific gene families colour-coded as proteases (purple), lipophilic GPCRs (blue) and kinases (red)). The independent intersection of compounds with drug-like properties, that is those in a region of chemical space defined by the possession of absorption, distribution, metabolism and excretion properties consistent with orally administered drugs — ADME space — is shown in green.

Christopher Lipinski & Andrew Hopkins, NATURE|VOL 432 | 16 DECEMBER 2004, pp.855-861

Page 5: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Network Similarity Graphs for six classes of enzyme inhibitors

Wawer, Peltason, Weskamp, Teckentrup and Bajorath,J. Med. Chem. 2008, 51, 6075–6084

Page 6: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

•  How can we characterize these chemical subspaces? •  How are they related to and how do they differ from

each other? •  What are the topological characteristics of the

networks?

•  How do structure-activity indices like SALI and SARI compare with networks characteristics obtained from “purely molecular” similarity/dissmiliarity measures without reference to biological activities?

Page 7: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

What next ?

•  Motivation •  Background •  Enabling Technology •  Preliminary Results •  Future Directions

Page 8: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Network measures  Degree k: The most elementary characteristic of a node, which

tells us how many links the node has to other nodes.

 Degree distribution P(k): The probability that a selected node has exactly k links. ― obtained by counting the number of nodes with k = 1,2... links

and dividing by the total number of nodes. ― allows us to distinguish between different classes of networks.

 Clustering coefficient: CI = 2nI/k(k-1), where nI is the number of links connecting the k neighbors of node I to each other, i.e., CI

is the number of triangles that go through node I. ― C(k) is the average clustering coefficient of all nodes with k

links.

Page 9: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Random and Scale-free networks

A.-L. Barabási, Linked: The New Science of Networks. Cambridge, MA: Plume Books, 2003.

Random Scale-free

Page 10: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Random networks

  The node degrees follow a Poisson distribution, indicating that most nodes have approximately the same number of links (close to the average degree).

  The tail of the degree distribution decreases exponentially P(k) ~ exp(-k), indicating that nodes that significantly deviate from the average are extremely rare.

  The mean path length is proportional to the logarithm of the network size, indicating small-world property.

Page 11: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Scale-free networks  Characterized by a power-law degree distribution: the

probability that a node has k links follows P(k) ~ k-γ.

  The probability that a node is highly connected is statistically more significant than in a random graph, the network's properties often being determined by a relatively small number of highly connected nodes (hubs).

― Such distributions are seen as a straight line on a log–log plot.

 Scale-free networks with degree exponents 2-3 (as in most biological and non-biological networks) are ultra-small, with the average path length following ℓ ~ log log N – significantly shorter than the log N that characterizes random small-world networks.

Page 12: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Genesis of a scale-free network A.-L. Barabási, Linked: The New Science of Networks. Cambridge, MA: Plume Books, 2003.

Page 13: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Hierarchical networks

 Clusters combine in an iterative manner, generating a hierarchical network and accounting for the coexistence of modularity, local clustering and scale-free topology.

  The most important signature of hierarchical modularity is the scaling of the clustering coefficient, which follows C(k) ~ k-1 a straight line of slope -1 on a log–log plot .

 A hierarchical architecture implies that sparsely connected nodes are part of highly clustered areas, with communication between the different highly clustered neighborhoods being maintained by a few hubs.

Page 14: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Random, Scale-free and Hierarchical networks

Degree distribution

Clustering coeff.

Page 15: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Topological robustness •  Disabling a substantial number of nodes in a random network

results in functional disintegration: if a critical fraction of nodes is removed, a phase transition occurs, breaking the network into tiny, non-communicating islands of nodes.

•  Scale-free networks do not have a critical threshold for disintegration — they are robust against accidental failures: even if 80% of randomly selected nodes fail, the remaining 20% still form a compact cluster with a path connecting any two nodes. –  This is because random failure affects mainly the numerous small

degree nodes, the absence of which doesn't disrupt the network's integrity.

–  But this reliance on hubs induces vulnerability to targeted attack — the removal of a few key hubs splinters the system into small isolated node clusters.

•  Complex systems, from the cell to the Internet, can be amazingly resilient against component failure, withstanding incapacitation of many individual components and many changes in external conditions.

Page 16: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

What next ?

•  Motivation •  Background •  Enabling Technology •  Preliminary Results •  Future Directions

Page 17: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Descriptors Similarity Network Molecular Structures

Structural Descriptors

Physiochemical Descriptors

Topological Descriptors

Geometrical Descriptors

Page 18: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Reconstruction Method

  Based on Bader’s Theory of Atoms in Molecules   Exploits the approximate transferability of atomic and functional group

properties from one molecule to another in a similar environment,   providing a rapid means of computing electronic property information

for bio-molecules and large molecular datasets,   yielding throughputs of around half a million molecules per processor

per hour.

Page 19: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

 For each atom in the molecule, determine atom types and assign the closest match from the atom type library

 Combine densities of atomic fragments — densities of atomic fragments determined from ab initio computations on small molecules

 Compute predicted molecular properties

Reconstruction Method

N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein Modeling" in "The Quantum Theory of Atoms in Molecules: From Solid State to DNA and Drug Design" C.F. Matta & R.J. Boyd, Eds. (Wiley-VCH, 2007)

Page 20: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Atomtyper assignment algorithm Assigns the closest match for each atom from the TAE library according to the following priority:

 Perfect match

 Ring size differs

 Hybridization of nearest neighbor does not match

 Atomic number of nearest neighbor does not match

 Hybridization of atom does not match

  For monovalent atom, hybridization of nearest neighbor differs

 Atomic number of atom does not match

Level 3

Level 2

Level 1

Level 0

Level -1

Level -2

Level -3

Page 21: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Electrostatic Potential

Politzer’s Local Average Ionization Potential

Page 22: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Surface Property Distribution RECON/TAE Descriptors

Surface electronic properties of each atom in the molecule combined to give molecular property distributions

PIP (Politzer’s Local Average Ionization Potential) surface property for a member of the Lombardo blood-brain barrier dataset.

Page 23: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Descriptors Similarity Network Molecular Structures

Structural Descriptors

Physiochemical Descriptors

Topological Descriptors

Geometrical Descriptors

Page 24: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

An electron density derived similarity measure

•  One similarity/distance measure employed 4 electron density derived properties from RECON to construct the network graph: –  SIEP (Surface integral of Electrostatic Potential) –  SIEP max–min –  PIP average (Politzer’s local ionization potential) –  and PIP max–min

–  Inter-correlation < 10-6 •  R2= Δ(SIEP)2 + Δ(EP spread)2 + Δ(PIP avg)2 + Δ(PIP spread)2

(after scaling)

•  Three different thresholds were used: –  0.3, 0.1 and 0.03

•  Molecules with distance less than (or similarity greater than) the threshold were connected by an edge of the network graph.

Page 25: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

A physicochemical similarity measure

•  Another measure employed 3 physicochemical properties (from babel): –  logP (log octanol/water partition coefficient), – MR (molar refractivity) – and PSA (polar surface area)

–  Inter-correlation < 10-6

•  R2= Δ(logP)2+ Δ(MR)2+ Δ(PSA)2 (after scaling)

•  Three different thresholds were used: –  0.3, 0.1 and 0.03

•  Molecules with distance less than (or similarity greater than) the threshold were connected by an edge of the network graph.

Page 26: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

USR descriptors •  Ultrafast shape recognition is a similarity search tool

developed by Pedro Ballester and Graham Richards, J. Comp. Chem. 28(10):1711-1723 (2007):

•  employs molecular shape moments with respect to: •  the centroid (ctd), •  the closest atom to the ctd (cst), •  the farthest atom from the ctd (fct) •  and the farthest atom from the fct (ftf).

•  is alignment-free, extremely fast, generates a compact shape profile and performs well at shape classification.

•  We employed a Cartesian distance measure using the first moments with respect to ctd, cst and fct.

Page 27: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

What next ?

•  Motivation •  Background •  Enabling Technology •  Preliminary Results •  Future Directions

Page 28: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Network Topology of Chemical Space

• What is the characteristic degree dependence of a Chemical Space network?

• Do different subspaces within this chemical space possess noticeably different topology and, if so, what accounts for this difference(s)?

• How do characteristics compare between natural products, drug-like molecules and lead-like molecules? (e.g. from ZINC, PubChem, CAS Registry, WDI, MDDR, and other corporate pharmaceutical databases)

Page 29: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Mapping the ZINC dataset

logP MR PSA

•  A database of over 13 million commercially-available compounds for virtual screening.

•  http://zinc.docking.org/

Page 30: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Scaling of network connectivity

Page 31: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Scaling of Connectivity

Different cutoffs are shown with different colors

Page 32: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Scaling of Connectivity

Different cutoffs are shown with different colors

Page 33: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

ZINC Natural Products subset ~100K molecules; logP, MR, PSA

Page 34: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

ZINC Natural Products subset ~100K molecules; USR descriptors

Page 35: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Atomtyper distance For any pair of molecules we define:

– a discrete "Atomtyper distance" ― the number of atoms in one molecule that are well represented, to within a specified level of similarity, by the set of atom types present in the other;

– a measure of “scaffold similarity.”

– an "alchemical distance" ― the number of atoms that have to be added, deleted or substituted to transform one molecule to another, to within a specified level of similarity.

Page 36: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Level selection threshold Fit level designations and their corresponding meanings

Element Valency Neighbors Valency of Neighbors Ring size

Level 3 X X X X X

Level 2 X X X X

Level 1 X X X

Level 0 X X

Level -1 X

Level -2

Level -3

A requested atom type string is compared to each atom type string in the TAE library list in succession until, a) level designation is equal to 3 is found or, b) the level designation of current library atom string is less than last library atom string. The library atom type string with maximum level designation is used to model the requested atom in molecule.

Page 37: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Scaling of the ZINC database with Atomtyper distances

Page 38: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Conclusions & Tentative Hypotheses

•  Both the whole-molecule similarity measures investigated show a similar scaling, at different thresholds, with a sharp drop in the number of highly-connected nodes.

•  This drop-off indicates that there are very few molecules that are very similar to a large fraction of the database. –  Chemists engineer molecules for diversity.

•  The natural products subset of ZINC seems to have a similar network topology to the full ZINC database.

•  Atomtyper distances show a power law degree distribution. •  Note that the Atomtyper distance is not a true molecular

similarity measure, but a measure of “scaffold similarity.” •  A power law degree distribution indicates preferential

attachment of new nodes to highly-connected nodes. –  Chemists synthesize new molecules from well-known scaffolds.

Page 39: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

What next ? •  PubChem mapping: ongoing •  Analysis of other subspaces •  Other similarity measures

–  similarity in bioactivity: e.g. SALI,SARI

–  “Alchemical” similarity –  Fingerprints - ongoing

•  Scale-free or Hierarchical? –  Clustering coefficient

• Modification of the distance measure to account for synthetic accessibility leads to obvious implications for lead optimization.

Page 40: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

ACKNOWLEDGMENTS

•  Mike Krein

•  Curt Breneman Director, RECCR Center

•  NIH Molecular Libraries Initiative

Graduate Student Chemistry

Page 41: Chemical space network topology through atom typingacscinf.org/docs/meetings/238nm/presentations/238nm45.pdf · N. Sukumar and Curt M. Breneman, "QTAIM in Drug Discovery and Protein

Reserve Slides