View
31
Download
0
Category
Preview:
DESCRIPTION
Ontology, RDF, SW for Chemical Structures. T N Bhat & J. Barkley NIST. Bhat@nist.gov. Query tool. Use Case. Publications. Major Features, Goal – to Reduce User Frustration. We have established a use case at the HCLS Website - Chemical taxonomies - PowerPoint PPT Presentation
Citation preview
Ontology, RDF, SW for Ontology, RDF, SW for Chemical StructuresChemical Structures
T N Bhat & J. BarkleyT N Bhat & J. Barkley
NISTNIST
PublicationsQuery tool Use Case
Bhat@nist.gov
Major Features, Goal – to Reduce Major Features, Goal – to Reduce User FrustrationUser Frustration
We have established a use case at the HCLS We have established a use case at the HCLS Website - Chemical taxonomiesWebsite - Chemical taxonomiesCombining of Rule-based terms with Vocabulary-Combining of Rule-based terms with Vocabulary-based terms to define elements of RDFbased terms to define elements of RDFOrganization of the elements of RDF into Organization of the elements of RDF into predictable ontology using concepts from use predictable ontology using concepts from use casescasesDeveloping tools and techniques to present the Developing tools and techniques to present the information using familiar database environments information using familiar database environments – Allows easier portability and implementation of the Allows easier portability and implementation of the
information by the communityinformation by the community
Illustrating the concept using high profile data Illustrating the concept using high profile data such as for AIDS inhibitors and Protein Data Bank such as for AIDS inhibitors and Protein Data Bank contentscontents
Combining of Rule-based with Vocabulary-Combining of Rule-based with Vocabulary-based elements to define RDFbased elements to define RDF
Chemical structures are definable by atomic Chemical structures are definable by atomic connectivity – thus structures are suitable for connectivity – thus structures are suitable for identification using graph theory – InChIidentification using graph theory – InChI– Suitable for machine reasoningSuitable for machine reasoning
Graphs are hard to digest for humans – therefore Graphs are hard to digest for humans – therefore proposal is to combine InChI with familiar proposal is to combine InChI with familiar vocabularies such as Ala, Phenyl, Adenine vocabularies such as Ala, Phenyl, Adenine – Also include synonyms in the vocabulary for greater Also include synonyms in the vocabulary for greater
coverage among diverse userscoverage among diverse users– Vocabularies make it easier for humans to recognize the Vocabularies make it easier for humans to recognize the
informationinformation
InChI – a Scalable URIInChI – a Scalable URI
InChI is generated using a software InChI is generated using a software that decodes the chemical that decodes the chemical connectivity information in certain connectivity information in certain layers such as chirality, ring layers such as chirality, ring structure, atom type and then re-structure, atom type and then re-codes them to form a text stringcodes them to form a text string
InChI is a naming standard for InChI is a naming standard for chemicals recommended by IUPACchemicals recommended by IUPAC
InChI – a rule-based URIInChI – a rule-based URIInChI InChI
– _1_2FC10H11NO2_2Fc11-_1_2FC10H11NO2_2Fc11-10_2812_2913-9-5-7-3-1-2-4-10_2812_2913-9-5-7-3-1-2-4-8_287_296-9_2Fh1-4_2C9H_2C5-8_287_296-9_2Fh1-4_2C9H_2C5-6H2_2C_28H2_2C11_2C12_296H2_2C_28H2_2C11_2C12_29
Vocabulary-based DefinitionsVocabulary-based DefinitionsFor decades scientists have been developing names to identify For decades scientists have been developing names to identify structures and their imagesstructures and their images– Simple namesSimple names
HisHisAlaAlaDNADNAATPATP
– Semi-rule-based IUPAC namesSemi-rule-based IUPAC names2-amino-3-methylpentanamide2-amino-3-methylpentanamide 4-amino-3-hydroxy-6-methylheptanoic_acid4-amino-3-hydroxy-6-methylheptanoic_acid 1-[(Benzenesulfonyl-methyl-amino)-phenyl-butyl]-piperidin-4-yl}-1-[(Benzenesulfonyl-methyl-amino)-phenyl-butyl]-piperidin-4-yl}-propyl-carbamic acid, naphthalen-1-ylmethyl esterpropyl-carbamic acid, naphthalen-1-ylmethyl ester
Names facilitate text-based queries of desired componentsNames facilitate text-based queries of desired componentsNames when used together with InChI provide a smoother Names when used together with InChI provide a smoother integration of machine and human needsintegration of machine and human needs
Use-Case for SW; Treatment for Use-Case for SW; Treatment for AIDS is a work in progressAIDS is a work in progress
Treatments for AIDS are of two typesTreatments for AIDS are of two types– Prevention – the most effectivePrevention – the most effective– ContainmentContainment
Drugs to contain, and reduce the viral loadDrugs to contain, and reduce the viral load– Majority of the drugs ( ~17) target either HIV Majority of the drugs ( ~17) target either HIV
protease or RTprotease or RT– Complete suppression of either of these viral Complete suppression of either of these viral
enzymes could cure AIDSenzymes could cure AIDS– But drug resistance leads only to partial But drug resistance leads only to partial
suppression of the enzymessuppression of the enzymes
All the drug design efforts for AIDS are All the drug design efforts for AIDS are based on structuresbased on structuresData needed for drug-design is scattered Data needed for drug-design is scattered over many Web resources and users often over many Web resources and users often wean through the data manuallywean through the data manuallyTherefore AIDS drug design is an ideal Therefore AIDS drug design is an ideal target for Semantic Web and novel new target for Semantic Web and novel new database related technologiesdatabase related technologiesSW connection between NIST and NIAID SW connection between NIST and NIAID AIDS databaseAIDS database
Choose the problem that matters
Website
Annotation Technique/Developing Annotation Technique/Developing Structural OntologyStructural Ontology
Define compounds using chemical Define compounds using chemical features of interest to use casesfeatures of interest to use cases– Fragment, subgroup, classFragment, subgroup, class
1A8K000503 000505 030798
Modeling with Protégé – Suitable Modeling with Protégé – Suitable for Text-based Ontologyfor Text-based Ontology
Web toolsWeb toolsStructures are different from text based Structures are different from text based infoinfo– Structures are not amenable to text-based Structures are not amenable to text-based
query/rendering techniques query/rendering techniques – Majority of the structural users never heard Majority of the structural users never heard
(nor want to hear!) about SPARQL – query (nor want to hear!) about SPARQL – query language for RDFlanguage for RDF
– Commonly preferred/expected way to query is Commonly preferred/expected way to query is by ‘by ‘clickclick’’
Semantic Web for Structures needs new Semantic Web for Structures needs new Web tools that allow navigation by clicking Web tools that allow navigation by clicking on structural featureson structural features
Chem-BLAST for Structural Semantic WebChem-BLAST for Structural Semantic Web
http://bioinfo.nist.gov/SemanticWeb_pr3d/chemblast.doPrasanna et al. PROTEINS 60, 1-4 (2005).Prasanna et al. PROTEINS 63(4), 907-917(2006). Download publications
Future PlansFuture PlansExtend the work to chemical structures from Protein Data Bank Extend the work to chemical structures from Protein Data Bank
If interest exists hold a workshop at NIST Proposed dates - last If interest exists hold a workshop at NIST Proposed dates - last two weeks of March 2008two weeks of March 2008– Workshop will be in conjunction with the NIST wide Workshop will be in conjunction with the NIST wide
Ontology weekOntology week
Possible collaboration with IUPAC (International Union of Pure Possible collaboration with IUPAC (International Union of Pure and Applied Chemistry ) and ChEBIand Applied Chemistry ) and ChEBI– Contact: Colin Batchelor Contact: Colin Batchelor BatchelorC@rsc.orgBatchelorC@rsc.org
RSC Publishing,RSC Publishing,Royal Society of ChemistryRoyal Society of Chemistry
Community participation is essential for further Community participation is essential for further development development
Contact Contact bhat@nist.govbhat@nist.gov 301 975 5448 (US) 301 975 5448 (US)
Recommended