Upload
hafwen
View
29
Download
0
Embed Size (px)
DESCRIPTION
Construction of a Virtual Library of Potential Endocrine Disruptors for in silico Target Fishing. Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept. University of Innsbruck, Austria. Overview. What are Endocrine Disruptors? - PowerPoint PPT Presentation
Citation preview
234th ACS Meeting Boston 2007 C. Laggner
Construction of a Virtual Library of Potential Endocrine Disruptors
for in silico Target Fishing
Christian Laggner, PhD.Computer Aided Molecular Design GroupPharm. Chem. Dept.University of Innsbruck, Austria
234th ACS Meeting Boston 2007 C. Laggner
Overview
• What are Endocrine Disruptors?• Need for computational screening
methods • Construction of the compound library• Applicability of publicly available
compound collections – problems, needs
234th ACS Meeting Boston 2007 C. Laggner
Endocrine Disruptors
• Exogenous substances that interfere with the endocrine system of humans or animals – Mimick endogenous hormones– Block the effects of hormones– Change the levels of hormones: stimulate or inhibit
production, transport, or degredation
• Disturb regulation of development, growth, reproduction, and behavior
• Some common targets: – nuclear hormone receptors (ER, AR, PR, AhR, PPAR,
RXR, TR, …) – oxidoreductases (Aro, 11-HSD, …)
234th ACS Meeting Boston 2007 C. Laggner
ED chemicals come from various sources:
• Pesticides– Insecticides
– Bactericides, fungicides
• Additives in polymers
• Drugs – Side-effect
– Release in wildlife (wastewater)
• Phytoestrogens
• Produced from precursor substances– Incomplete combustion
– Wastewater
Cl
Cl
Cl
Cl
Cl
Cl
OH
RSn
R
R
X
R = Ph, nBu
OHOH
OH
OHOH
H H
H
OH
Some Examples
O
O Cl
Cl
Cl
Cl
O
OOH
OH
OH
234th ACS Meeting Boston 2007 C. Laggner
ED Screening Programs
• US: Endocrine Disruptor Screening Program (EDSP) http://www.epa.gov/scipoly/oscpendo/index.htm
• EU: REACH program (Registration, Evaluation and Authorisation of Chemicals), Endocrine Disrupters Website http://ec.europa.eu/environment/endocrine/index_en.htm
Tens to hundreds of thousands of compounds from various sources to be screened against multiple targets– prioritize small subset for initial screening
234th ACS Meeting Boston 2007 C. Laggner
Virtual High-Throughput Screening
• Collection of pharmacophore models for over 300 unique targets, also ED targets
• Fast screening of x compounds against y targets -> activity profiles– Find new candidates– Find new targets
More on pharmacophore-based parallel screening in Thierry‘s talk at 3:15 pm…
234th ACS Meeting Boston 2007 C. Laggner
But What Shall We Screen?
• Endocrine Disruption Priority Setting Database v.2 http://www.ergweb.com/endocrine/
– For selecting chemicals for Tier 1 Screening
– Pesticides, commercial chemicals, cosmetic ingredients, food additives, nutritional supplements, mixtures, …
– 142,975 entries
– No structures, but compound names and CAS numbers
– Merge with structures from a public substance library (PubChem)
234th ACS Meeting Boston 2007 C. Laggner
The PubChem Project
• Part of NIH's Molecular Libraries Roadmap Initiative
• Collects structures and information about molecules from various databases – DB sources: substance vendors, biological
properties, toxicology, metabolic pathways, …– links to original database
• Mixed bag of goodies: differrent information for various molecules
234th ACS Meeting Boston 2007 C. Laggner
The PubChem Project
• Data organized into 3 sub-databases:– PCSubstance: More than 19 Mio. substance
records (= original database entries)
– PCCompound: More than 10 Mio. compound records (= unique structures)
– PCBioAssay: almost 600 bioassays with data for selected compounds
• Data publicly accessible via – web browser: http://pubchem.ncbi.nlm.nih.gov/
– ftp client: ftp://ftp.ncbi.nlm.nih.gov/pubchem/
– access via a programmatic XML interface (PUG) http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi
234th ACS Meeting Boston 2007 C. Laggner
Pipeline Pilot
• Graphically compose data processing networks (“protocols”)
• Configurable components for each step
234th ACS Meeting Boston 2007 C. Laggner
Library Generation Overview
CAS nr. list
Unmerged hits
Merged hits
Unique structures
3D database
Name list
Unmerged hits
Merged hits
Search PC Substances
Merge by same name / CAS nr.
Merge by structure
Filter, 3D conversion
234th ACS Meeting Boston 2007 C. Laggner
Initial Searches
Name list:• Names exist for 65.1% of initial 143.0k list entries• Filtered:
– No CAS number („Roofing paper“, „Putrescent whole egg solids“, „Red pepper“, „Paint“, …)
– Name contains „polymer“, „derivative“, or „analogue“
– Name shorter than 4– characters String length
distribution peaks:truncated names
62305 (43.6%) unique names remaining
234th ACS Meeting Boston 2007 C. Laggner
Initial Searches
• Search with name list in PubChem Substances, July 2007 (17.8 Mio. entries):– 85,000 hits– 46.6% of list entries found – Takes 11.5 h on a Pentium 4, 3.0 GHz
CAS number list:• 97.0% had unique number• Search in PubChem Substances:
– 179,000 hits – 83.5% of list entries found – Takes 46 min
• Only 3060 entries found by name and not by CAS number
234th ACS Meeting Boston 2007 C. Laggner
Merge Hits for Same Search Terms
Have molecular structure, not isotope-labeled, no R-groups. Correct protonation states
Merge name hits by name / CAS hits by CAS number
How to check whether different structures describe the same molecule?– Stereochemistry not always fully described
• Solution: remove stereochemistry and compare SMILES string
– Different tautomers for the same compound give different SMILES strings
• Solution (not for all cases): InChI
234th ACS Meeting Boston 2007 C. Laggner
InChI
• IUPAC International Chemical Identifier
• Describes chemical structures in layers and sublayers: chemical formula, connectivity, charges, protonation states, stereochemistry, isotopes, tautomerism
• Different layers allow to adjust the level of similarity/identity
but
• tautomerism detection does not include keto-enol and ring-chain tautomerism (sugars…)
InChI=1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11) 12-5/h2,5,7-10H,1H2/t2-,5+/m0/s1
234th ACS Meeting Boston 2007 C. Laggner
Merge Hits for Same Search Terms
Merge multiple structures per entry by InChI without stereo and tautomerism layersPrefer the structure with the highest amount of stereo information (longest SMILES string)
4.5% (by name) and 7.9% (by CAS number) had still different structures: errors, additional components Check whether we can find a preferred structure
234th ACS Meeting Boston 2007 C. Laggner
Looking for the Preferred Structure
„ferric ammonium citrate“ gives two results:
3 hits (from three differrent databases), 6 carbon atoms
1. Preferred structure among hits from one database2. Preferred structure among all hits
4 hits (all from one database), 7 carbon atoms
WRONG
234th ACS Meeting Boston 2007 C. Laggner
Citric acid1,2,3-Propanetricarboxylic acid, 2-hydroxy-, manganese saltMagnesium citrateCitric acid monohydrate1,2,3-Propanetricarboxylic acid, 2-hydroxy-, lead(2+) salt (2:3)Ferrous citrate1,2,3-Propanetricarboxylic acid, 2-hydroxy-, iron salt1,2,3-Propanetricarboxylic acid, 2-hydroxy-, iron(3+) salt (1:1)Ferric citrateFerric ammonium citrateSodium citrateSodium citrate dihydrate
Merge Salts and Mixtures
Remove small counterions, mixture components, neutralise compounds: preferrence for one structure for >80% of problematic hits
Check for wrong valences
Merge all compounds with same structure– prioritize names of unmodified
compounds
– save names, CAS, … of the others
OHOH
OOHO
OH
O
234th ACS Meeting Boston 2007 C. Laggner
Checking for Right Valences
Pentavalent carbon atoms are not so rare as you might think…
234th ACS Meeting Boston 2007 C. Laggner
Final Filtering
Keep only compounds suitable for pharmacophore screening:
– Only selected elements: H, C, N, O, S, P, F, Cl, Br, I, B, Al, Si, Ge, As, Se, Sn, Sb, Te, Pb
– Must have at least one C atom– 70 ≤ MW ≤ 1000
76754 compounds, 63.9% of search list
234th ACS Meeting Boston 2007 C. Laggner
Construction of the 3D Database
• Prepare 3D start conformation: add H atoms, generate 3D coordinates, minimize
• Generate 3D database with Catalyst catDB (FAST, MaxConfs = 255): 76677 successfuly converted (99.9%)
234th ACS Meeting Boston 2007 C. Laggner
Analysis of the Database
• Derwent WDI 2005 (67050 entries): filtered, desalted, merged in same way 57667 entries remaining
• Overlap: 8513 entries (14.8% WDI, 9.0% EDPSD)
• Oral bioavailability (Lipinski‘s Rule of 5):– WDI 64.0%– EDPSD 79.2%
• Druglikeness (Ghose et al.1999):– WDI 39.7%– EDPSD 18.2%
234th ACS Meeting Boston 2007 C. Laggner
Analysis of Results
Red: WDIBlue: EDPSD
234th ACS Meeting Boston 2007 C. Laggner
First Screening Results
Target % EDPSD % WDI
ER 0.52 0.86
ER 0.13 0.43
PPAR 0.40 0.70
PPAR 2.53 4.41
PPAR 8.20 16.74
RXR 0.80 0.59
RXR 1.68 2.13
TR 0.03 0.17
TR 0.01 0.07
234th ACS Meeting Boston 2007 C. Laggner
Harvesting Structures from Public DBs
• Many common chemicals can be retrieved by comparing public compound lists
• Searching via a registry number (CAS, SID, CID, EINECS/ELINCS, …) is much faster than via name
– Names splitted between PCSubstances and PCCompounds
– Often wrong CAS number given (salts, hydrates, mixtures, …)
PCS: PUBCHEM_EXT_DATASOURCE_REGID: 408148PUBCHEM_SUBSTANCE_SYNONYM: 1H-Benzimidazol-5-amine, 2- (4-aminophenyl)-
2-(4-Aminophenyl)-5-aminobenzimidazole7621-86-5 NSC408148
PCC: PUBCHEM_IUPAC_OPENEYE_NAME: 2-(4-aminophenyl)-3H-benzimidazol-5-aminePUBCHEM_IUPAC_CAS_NAME: 2-(4-aminophenyl)-3H-benzimidazol-5-aminePUBCHEM_IUPAC_NAME: 2-(4-aminophenyl)-3H-benzimidazol-5-aminePUBCHEM_IUPAC_SYSTEMATIC_NAME: 2-(4-aminophenyl)-3H-benzimidazol-5-aminePUBCHEM_IUPAC_TRADITIONAL_NAME: [2-(4-aminophenyl)-3H-benzimidazol-5-
yl]amine
234th ACS Meeting Boston 2007 C. Laggner
Harvesting Structures from Public DBs
• Chirality information is often missing or unclearly defined– 2D structures: wedged bonds or pseudo-3D
– 3D structures: atom stereo parity set ortake it from the 3D structure
• Tautomerism: partially solved by InChI– No keto-enol tautomerism
– No ring-chain tautomerism
– Workaround: connectivity? (together with MW, MF)
234th ACS Meeting Boston 2007 C. Laggner
Conclusions
• Public databases and compound lists useful for in silico reprofiling of known compounds
• Different sources - different level of information
– Need standards for treating stereo information
– Problem of tautomerism
• There are always some errors…
– Comparison of different data sources may help us find some of them
– How can we give feedback about wrong structures and avoid further spreading of errors?
234th ACS Meeting Boston 2007 C. Laggner
Acknowledgements
• Simona Distinto• Johannes Kirchmair• Thierry Langer• Patrick Markt• Daniela Schuster• Gudrun Spitzer • Theodora Steindl
• Fabian Bendix• Martin Biely• Alois Dornhofer• Robert Kosara• Judith Rollinger• Gerhard Wolber
• Rémy D. Hoffmann• Nicolas Triballeau
• Lyubomir G. Nashev• Alex Odermatt
NIH / PubChem Project EPA / Endocrine Disruptor Screening Program
234th ACS Meeting Boston 2007 C. Laggner
Finally…
Thank you for your attention!