Online Chemical Database with Modelling Environment

Preview:

DESCRIPTION

AACIMP 2009 Summer School lecture by Yuriy Sushko and Sergii Novotarskyi. "Environmental Chemoinfornatics" course.

Citation preview

Online chemical databasewith modeling environment”a summer school course

Sergii NovotarskyiIurii Sushko

Chemoinformatics – overview of online resourcesChemical databases

1. PubChem — a database that provides information on the biologicalactivities of small molecules

2. ChemSpider — a free access service providing a structure centriccommunity for chemists

3. ChemIDplus — a tool, that provides chemical structure, property, andtoxicity searching

4. ChemBank — a database of chemical structures and assays

5. ChemDB — a set of chemoinformatics tools

Chemoinformatics – overview of online resourcesLiterature databases

6. PubMed — a service, that includes over 19 million citations fromMEDLINE and other life science journals for biomedical articles back to1948

7. Toxicology Literature Online (TOXLINE) — references from toxicologyliterature

8. ScienceDirect — a full-text scientific database offering articles/chaptersfrom more than 2,500 peer-reviewed journals and more than 10,000books

9. ACS Publications — a worldwide scientific community with a collectionof the most cited peer-reviewed journals in the chemical and relatedsciences.

Chemoinformatics – overview of online resourcesPubChem – start page

URL: http://pubchem.ncbi.nlm.nih.gov/ or for «PubChem»

Chemoinformatics – overview of online resourcesPubChem – search results

Chemoinformatics – overview of online resourcesPubChem – compound details

Chemoinformatics – overview of online resourcesPubChem – bioassay search results

Chemoinformatics – overview of online resourcesChemSpider – start page

URL: http://www.chemspider.com/ or for «ChemSpider»

Chemoinformatics – overview of online resourcesChemSpider – search results

Chemoinformatics – overview of online resourcesChemIdPlus – main page

URL: http://chem.sis.nlm.nih.gov/chemidplus/

for «ChemIdPlus»

Chemoinformatics – overview of online resourcesChemIdPlus – search results

Chemoinformatics – overview of online resourcesChemBank – main page

URL: http://chembank.broadinstitute.org/ or for «ChemBank»

Chemoinformatics – overview of online resourcesChemBank – search results

Chemoinformatics – overview of online resourcesChemDB – main page

URL: http://cdb.ics.uci.edu/ or for «ChemDB»

Chemoinformatics – overview of online resourcesChemDB – search results

Chemoinformatics – overview of online resourcesPubMed – main page

URL: http://www.ncbi.nlm.nih.gov/pubmed/ or for «PubMed»

Online chemical database with modeling environmentThe subject of development

The web-based service

The database of physical, chemical and biological properties

Accumulating experimentally verified dataProviding user-friendly web-based access to this data

The QSPR modeling environment

Providing web-based tools for QSPR modelingStoring and “publishing” created models

Online chemical database with modeling environmentMotivation

Our motivation

The importance of QSPR modeling

The importance of web-based tools for QSPR modeling

The importance to build one more service in this field

Online chemical database with modeling environmentMotivation - QSPR

Structure-property relationship hypothesis:

QSPR modeling:

log (IC50) =0.64 log(µM)

log (IC50) =1.87 log(µM)

log (IC50) =1.87 log(µM)

log (IC50) = ?

“Similar structures - similar properties”

Predicting properties based on availabledata for structurally similar molecules.

Structures are represented by a set ofdescriptors (atom count, molecularweight).

Online chemical database with modeling environmentQSPR – Similarity in descriptor space

Number of specific fragments in a molecule

Online chemical database with modeling environmentMotivation - web-based tools for modeling

Main benefits of web-based tools:

Availability and accessibilityonly a computer with Internet access and a modern web-browser requiredto start working; possibility to share work materials among severallocations; works with any platform (Linux, Win, Mac)

Communication and collaborationpossibility to work on common topics, publish own results and use newresults of other people

Online chemical database with modeling environmentMotivation - one more web-based tool

Reasons to build one more service:

Different approach to data modificationa completely open database, any user can add, delete and edit data (only

constrained by a set of simple rules)

Different approach to data organizationdata in the database is organized in a way, suitable for QSPR modeling

Integration of a database with modeling toolsdata from the database can be used for model creation and property

prediction

Online chemical database with modeling environmentDistinctive features

The features, that make our service different:“Wiki” approach to data handlingusers can add, modify and delete data

Mandatory reference to an articleevery record in a database should contain a reference to an article, wherethe data was published

Storing additional informationwe store measurement conditions to increase data quality

Several tools to support decision makingintegration with other web-services (validation of molecule names againstPubChem database, automatic fetching of article information fromPubMed), duplicate records management

Aimed at model buildingconvenient to build training sets from data - filter by property, article andexport data either to internal modeling tools or download as Excel file

Online chemical database with modeling environmentData structure

Online chemical database with modeling environmentSimplified data structure

Records Properties

Molecules

ArticlesUnits

Journals

Conditions

Users

Online chemical database with modeling environmentUser interface agreements

Browser-based interface

Online chemical database with modeling environmentUser interface agreements

Browser-based interface

Online chemical database with modeling environmentUser interface agreements

Icons

Edit current record (item, article, unit, etc.)

Delete current record

Most places — open record-specific submenu, sometimes — view profile

Open a wiki page with additional explanations

Send a message to the user

Download data in XLS format

Select item

Online chemical database with modeling environmentSummary

The database currently contains:

More than 50000 records

Around 285 properties

More than 2700 articles

Thank you

Online chemical database with modeling environmentPractical course - outline

• Collection of data from original literature

• Use of publicly available tools for literature and cmemical structurelookup

• Introduction of data to OCHEM — single record

• Collection of data from benchmark literature

• Introduction of data to OCHEM — batch upload

Online chemical database with modeling environmentPractical course – collection of data – before we start

Article name PubMedID Compound name Value

1

2

3

4

5

Online chemical database with modeling environmentPractical course – collection of data

The goal: achive data on CYP450 1A2 inhibitors and noninhibitors

Cytochrome P450 (abbreviated CYP, P450, CYP450) is a very large and diversesuperfamily of hemoproteins found in all domains of life. © Wikipedia

PubMed search terms: CYP1A2 inhibition

Online chemical database with modeling environmentPractical course – data collection

Article name PubMedID Compound name CYPModulation

1 Chemical genomics ofcancer chemopreventivedithiolethiones

19126641 •3H-1,2-dithiole-3-thione•4-methyl-5-pyrazinyl-3H-1,2-dithiole-3-thione•5-tert-butyl-3H-1,2-dithiole-3-thione

InhibitorNoninhibitorNoninhibitor

2 Comprehensive in vitroanalysis of voriconazoleinhibition of eight cytochromeP450 (CYP) enzymes: majoreffect on CYPs 2B6, 2C9,2C19, and 3A

19029318 Voriconazole Noninhibitor

3 Involvement of CYP1A2 inmexiletine metabolism 9690950 Mexiletine Inhibitor

4 Differential inhibition ofcytochrome P450 isoforms bythe protease inhibitors,ritonavir, saquinavir andindinavir

9278209 Indinavir Noninhibitor

5 An evaluation of potentialmechanism-based inactivationof human drug metabolizingcytochromes P450 bymonoamine oxidaseinhibitors,including isoniazid.

16669850 Clorgyline Inhibitor

Online chemical database with modeling environmentPractical course – data introduction – cheat sheet

Good chemistry lookup engine: PubChem (find URL in Google.com)

We search by name, and want to get structure

Convenient structure representation - SMILES

Property: CYP450 Modulation

Condition: CYP450 Type = CYP1A2

Online chemical database with modeling environmentPractical course – batch data introduction – template

• CASRN — CAS registration number• SMILES — smiles string• NAME — molecule name• ARTICLEID — article identifier (PubMed or OCHEM)• PAGE — article page• TABLE — article table• LINE — article line• COMMENT — text comment• REFERENCE — record reference• CYP450 Modulation — value of the property• Unit — measurment unit of the property• Accuracy — measurment accuracy• Interval — measurmen interval• CYP450 Type — record condition

Online chemical database with modeling environmentPractical course – batch data introduction – cheat sheet

• Article URL: http://tinyurl.com/rendic• Article title: «Summary of information on human CYP enzymes:

human P450 metabolism data»• Good chemistry lookup engine: PubChem (find URL in Google.com)• We search by name, and want to get structure• Convenient structure representation - SMILES• Property: CYP450 Modulation• Condition: CYP450 Type = CYP1A2• Reference = 1• ArticleID = Q1592• Batch upload template URL: http://tinyurl.com/bu-template

Thank you (once more)

Recommended