21
RKB – A Semantic Knowledge Base for RNA Michel Dumontier 1 , José Cruz- Toledo 1 Marc Parisien 2 , Francois Major 2 1 Carleton University 2 Université de Montreal

RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

Embed Size (px)

DESCRIPTION

Increasingly sophisticated knowledge about RNA structure and function requires an inclusive knowledge representation that facilitates the integration of independently-generated information arising from such efforts as genome sequencing projects, microarray analyses, structure determination and RNA SELEX experiments. While RNAML, an XML-based representation, has been proposed as an exchange format for a select subset of information, it lacks machine-understandable semantics that make it arbitrarily user-extensible, as is the case for formal logic based languages. Here, we describe an RNA knowledge base (RKB) for structure-based knowledge using RDF/OWL Semantic Web technologies. RKB contains basic terminology for nucleic acid composi-tion along with context/model-specific representation of structural features such as sugar conformations, base pairings and base stackings. RKB is populated with RNA PDB entries and MC-Annotate structural annotation. The use of semantic web technologies addresses the reality of diverse interests of the RNA Ontology Consortium and supports knowledge discovery over independently-published RNA knowledge.

Citation preview

Page 1: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

RKB – A Semantic Knowledge Base for RNA

Michel Dumontier 1, José Cruz-Toledo 1

Marc Parisien 2, Francois Major 2

1 Carleton University2 Université de Montreal

Page 2: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

2Carleton University -- Dumontier Lab dumontierlab.com

Objectives

i. To represent biochemistry of nucleic acids and their structural characteristics including base pairing/stacking

ii. Represent context specific knowledge

iii. Capture the structural annotation generated by MC-Annotate

5/25/2009

Page 3: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

3Carleton University -- Dumontier Lab dumontierlab.com

Guided design

• Modeling with Upper Level Ontologies– interoperability and semantic coherency– New Upper Level Ontology (NULO)

• distinguishes objects, qualities, roles, processes and spatial regions

• Based on BFO/RO, but for OWL

5/25/2009

Page 4: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

4Carleton University -- Dumontier Lab dumontierlab.com

• Objects– Occupy space

• Nucleic acids, nucleotides, riboses and phosphates

• Qualities– Intrinsic categorical or numeric valued property

• Nucleotide bears the quality of conformation

• Roles– Defined by extrinsic interactions

• A C3’ atom may hold the exo role during some sugar puckering

• Processes– Entities that extend in time

• structure determination, an interaction

5/25/2009

Biological Modeling

Page 5: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

5Carleton University -- Dumontier Lab dumontierlab.com

Contextual Modeling of Nucleic Acids

• Base stacking varies in different XRD/NMR models• Need to know in which model that info is found• We want to set the stage for representing simulation.

5/25/2009

Page 6: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

6Carleton University -- Dumontier Lab dumontierlab.com

RKB populated with PDB, MC-Annotate

• The ontology population involved 3 steps:

i. Assigning names

ii. Asserting class membership

iii. Assigning relations between entities

• The following naming convention was used:– Objects:

• Polymer: PDBID_cCHAIN• Residue: PDBID_cCHAIN_rRESIDUE• Atom: PDBID_cCHAIN_rRESIDUE_aAtom

– Quality/Roles• PDBID_mMODEL_cCHAIN_rRESIDUE_type

– Processes• Structure determination: PDBID_mMODEL• Interaction: PDBID_mMODEL_PROCESSTYPE_PARTICIPANT

5/25/2009

Page 7: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

7Carleton University -- Dumontier Lab dumontierlab.com

Support for Leontis-Westhof Nomenclature

5/25/2009

• The RKB incorporates LW nomenclature • Describes the three edges for H-bonding

interactions in purines (Y) and pyrimidines (R)• Atom composition:

i. Watson-Crick Edge:• A(N6)/G(O6), R(N1), A(C2)/G(N2),

U(O4)/C(N4), Y(N3) and Y(O2)

ii. Hoogsteen Edge (CH edge for R):• A(N6)/G(O6), R(N7), U(O4)/C(N4) and

Y(C5)

iii. Sugar Edge:• A(C2)/G(N2), R(N3), Y(O2) and O2’

• cis and trans orientations • relative orientations of the glycosidic bond

between the sugar and the PO4 group

Page 8: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

8

Support for LW+ Nomenclature

• Extension incorporates faces to each edge:

– WC edge:• Wh, Ww and Ws faces

– Hoogsteen Edge:• C8(Y), Hh, Hw and Bh

– Sugar Edge:• Bs, Ss(Y), Sw and O2’

• The Bh and Bs faces involve the Hoogsteen side amino/keto group and the sugar side amino/ keto group respectively.

• The C8 face was introduced for the C8-H8 donor group in purines

5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com

Page 9: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

9

Describing Base Pairs

• Base pairs composed of interactions with the edges or faces of the interacting bases

• Role chains capture additional knowledge:

Objects that participate in sub-processes (face interactions) are also participants of the process whole (base pair)

hasPart ◦ hasParticipant -> hasParticipant

Objects are involved in processes when their qualities are

isBearerOf ◦ isParticipantIn -> isParticipantIn

5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com

Page 10: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

10Carleton University :: Dumontier Lab :: dumontierlab.com

The RKB is compatible with both the LW and the Saenger nomenclature for base pairs

• The semantics of the RKB enables the usage of consistent bp naming schemes

• The AA BP in model 4 of PDB:1B36 can be classified as the being member of the following classes:– Saenger type II– LW Trans Hoogsteen/Hoogsteen (8)

5/25/2009

A A

NucleotideBasePairand ParallelBasePairand TransBasePairand HoogsteenHoogsteenBasePairand hasAgent exactly 2 AMP

Page 11: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

11Carleton University -- Dumontier Lab dumontierlab.com

Sugar Puckering

• The ribose ring presents two distinct puckering modes, envelope and twist

• The classification into either geometry is dependent on the relative position of the carbon atoms of the ribose to its C5’ atom

• Carbon atoms in a ribose thus bear either the endo or exo role with respect to the plane formed by the other atoms

5/25/2009

Page 12: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

12

Sugar Puckering (cont’d)

Our implementation of situational modeling assures that objects are represented by a single entity throughout their lifetime, thus avoiding the need to create multiple distinct instances of the same object in each particular spatial-temporal context with different attributes

5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com

Page 13: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

13

RKB is SPARQL accessible

• SPARQL is a graph query language• Loaded instantiated ontology into Virtuoso 6

• SPARQL endpoint– http://codemonkey.dumontierlab.com/sparql/

• Specify Graphs to restrict search– http://semanticscience.org/rkb/mcannotate/pdb/dna– http://semanticscience.org/rkb/mcannotate/pdb/rna

5/25/2009Carleton University :: Dumontier Lab ::

dumontierlab.com

Page 14: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

14

Query 1: Find all face interactions (model 1 of PDB:1B36)

PREFIX ss: <http://semanticscience.org/>

select distinct ?faceInteraction  where {?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> .?pair ss:hasProperPart ?faceInteraction .?faceInteraction rdf:type ss:FaceInteraction .}

5/25/2009Carleton University :: Dumontier Lab ::

dumontierlab.com

Nucleotide base pairs are composed of one or more face interactions. Where known, such as in the MC-Annotate results, we can retrieve all 18 instances of this that satisfy this query.

Page 15: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

Carleton University -- Dumontier Lab dumontierlab.com 155/25/2009

See results : http://tinyurl.com/porxdb

Page 16: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

16Carleton University :: Dumontier Lab :: dumontierlab.com

Query 2: Find all C8 mediated base pairs (model 1 of PDB:1B36)

PREFIX ss: <http://semanticscience.org/>SELECT DISTINCT ?faceInteraction ?residue ?hasC8Face where { ?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> . ?pair ss:hasProperPart ?faceInteraction . ?faceInteraction rdf:type ss:FaceInteraction . ?C8Face ss:isAgentIn ?faceInteraction . ?C8Face rdf:type ss:C8Face . ?residue ss:hasQuality ?C8Face

}

Results: http://tinyurl.com/r7b5e4

5/25/2009

Face interactions are mediated by the faces of bases. Nucleotides and their face qualities are related by the hasQuality relation, whereas faces are agents in the face interaction, and are related by the hasAgent relation.

Page 17: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

17Carleton University :: Dumontier Lab :: dumontierlab.com

Query 3: Find base pairs involving a GMP sugar-sugar face (model 1 of PDB:1B36)

PREFIX ss: <http://semanticscience.org/>

SELECT distinct ?faceInteraction ?residue ?hasSSFace WHERE {?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> .?pair ss:hasProperPart ?faceInteraction .?faceInteraction rdf:type ss:FaceInteraction .?hasSSFace rdf:type ss:SugarSugarFace .?hasSSFace ss:isAgentIn ?faceInteraction .?residue ss:hasQuality ?hasSSFace .?residue rdf:type ss:GMP}

Results found at: http://tinyurl.com/qpup8z

5/25/2009

This query builds on Query 2, in that it requires a Ss face to be on an AMP that is participating in a base pair. Two GMPs are found to have this particular face participating with other nucleotides in base pairs in this particular structure

Page 18: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

18Carleton University :: Dumontier Lab :: dumontierlab.com

Query 4: Find Hoogsteen – O2’ face interactions (model 1 of PDB:1B36)

PREFIX ss: <http://semanticscience.org/>SELECT distinct ?faceInteraction ?residue1 ?residue2 ?hasHhFace ?hasO2pFace where {?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> .?pair ss:hasProperPart ?faceInteraction .?faceInteraction rdf:type ss:FaceInteraction .?hasHhFace rdf:type ss:HoogsteenHoogsteenFace .?hasHhFace  ss:isAgentIn ?faceInteraction .?hasO2pFace rdf:type ss:O2pFace .?hasO2pFace  ss:isAgentIn ?faceInteraction .?residue1 ss:hasQuality ?hasHhFace .?residue2 ss:hasQuality ?hasO2pFace}

Results found at: http://tinyurl.com/oo4fp8

5/25/2009

LW+ nomenclature more detailed for base interactions. The result of this query describes a single base pair in this structure.

Page 19: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

19Carleton University -- Dumontier Lab dumontierlab.com

Future Directions

• Specify Saenger nomenclature • Map other structural annotator output (e.g. 3DNA)• Extend structural knowledge with 6 backbone angles

– range restrictions on classes

• SWRL / DL-safe rules or SPARQL query required to specify cyclic motifs

• Publish as part of Bio2RDF network

5/25/2009

Page 20: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

20Carleton University -- Dumontier Lab dumontierlab.com

RKB Availability

• Creative Commons License.• Google Code Project:

– http://semanticscience.org

• Instructions: http://code.google.com/p/semanticscience/wiki/RKBDownload

5/25/2009

Page 21: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

21Carleton University -- Dumontier Lab dumontierlab.com

References

• Dumontier, M., et al. (2009). RKB: A Semantic Web Knowledge Base for RNA, Accepted in Bio-Ontologies 2009, Stockholm, Sweden

• Smith, B., et al. (2005). Relations in biomedical ontologies. Genome Biol, 6(5): p. R46

• Leontis, N. B. and E. Westhof (2001). Geometric nomenclature and classification of RNA base pairs. RNA, 7(4): 499-512.

• Lemieux, S. and F. Major. (2002). RNA canonical and non-canonical base pairing types: a recognition method and complete repertoire. Nucleic Acids Res, 30(19): p. 4250-63.

• Major, F., Thibault, P., Computer Modeling of RNA Three-Dimensional Structures, in Encyclopedia of Molecular Cell Biology and Molecular Medicine, R.A. Meyers, Editor. 2005, Wiley-VCH Verlag GmbH & Co.: Weinheim. p. 605-636.

5/25/2009