Upload
cole-rose
View
45
Download
0
Embed Size (px)
DESCRIPTION
The Pathway Tools Schema. Motivations for Understanding Schema. Pathway Tools visualizations and analyses depend upon the software being able to find precise information in precise places within a Pathway/Genome DB - PowerPoint PPT Presentation
Citation preview
The Pathway Tools Schema
SRI InternationalBioinformaticsMotivations for Understanding
Schema
Pathway Tools visualizations and analyses depend upon the software being able to find precise information in precise places within a Pathway/Genome DB
When writing complex queries to PGDBs, those queries must name classes and slots within the schema
A Pathway/Genome Database is a web of interconnected objects; each object represents a biological entity
SRI InternationalBioinformaticsReference
Pathway Tools User’s Guide, Volume I Appendix A: Guide to the Pathway Tools Schema
SRI InternationalBioinformaticsWeb of Relationships for One
Enzyme
Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2
sdhA sdhB sdhC sdhD
Succinate + FAD = fumarate + FADH2
Enzymatic-reaction
Succinate dehydrogenase
TCA Cycle
SRI InternationalBioinformaticsFrame Data Model
Frame Data Model -- organizational structure for a PGDB
Knowledge base (KB, Database, DB)
Frames
Slots
Facets
Annotations
SRI InternationalBioinformaticsKnowledge Base
Collection of frames and their associated slots, values, facets, and annotations
AKA: Database, PGDB
Can be stored within An Oracle DB A disk file A Pathway Tools binary program
SRI InternationalBioinformaticsFrames
Entities with which facts are associated
Kinds of frames: Classes: Genes, Pathways, Biosynthetic Pathways Instances (objects): trpA, TCA cycle
Classes: Superclass(es) Subclass(es) Instance(s)
A symbolic frame name (id, key) uniquely identifies each frame
SRI InternationalBioinformaticsFrame IDs
Naming conventions for frame IDsUniqueness of frame IDs
Frame IDs must be unique within a PGDB Goal: Same frame ID within different PGDBs should refer to
the same biological entity Because many frames are imported from MetaCyc, this helps
ensure consistency of frame names Frame IDs for newly created frames (not imported) are
generated by Pathway Tools Those frame IDs contain a PGDB-specific identifier Example: CPLXzz-nnnn CPLXB3-0035
SRI InternationalBioinformaticsSlots
Encode attributes/properties of a frame Integer, real number, string, symbols
Represent relationships between frames The value of a slot is the identifier of another frame
Every slot is described by a “slot frame” in a KB that defines meta information about that slot
SRI InternationalBioinformaticsSlot Links
Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2
sdhA sdhB sdhC sdhD
Succinate + FAD = fumarate + FADH2
Enzymatic-reaction
Succinate dehydrogenase
TCA Cycle
product
component-of
catalyzes
reaction
in-pathway
SRI InternationalBioinformaticsSlots
Number of values Single valued Multivalued: sets, bags
Slot values Any LISP object: Integer, real, string, symbol (frame name)
Slotunits define properties of slots: datatypes, classes, constraints
Two slots are inverses if they encode opposite relationships
Slot Product in class Genes Slot Gene in class Polypeptides
SRI InternationalBioinformaticsRepresentation of Function
Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2
sdhA sdhB sdhC sdhD
Succinate + FAD = fumarate + FADH2
Enzymatic-reaction
Succinate dehydrogenase
TCA Cycle
EC#Keq
CofactorsInhibitors
Molecular wtpI
Left-end-position
SRI InternationalBioinformaticsMonofunctional Monomer
Gene
Reaction
Enzymatic-reaction
Monomer
Pathway
SRI InternationalBioinformaticsBifunctional Monomer
Gene
Reaction
Enzymatic-reaction
Monomer
Pathway
Reaction
Enzymatic-reaction
SRI InternationalBioinformaticsMonofunctional Multimer
Monomer Monomer Monomer Monomer
Gene Gene Gene Gene
Reaction
Enzymatic-reaction
Multimer
Pathway
SRI InternationalBioinformaticsPathway and Substrates
Reactant-1
Reaction
Pathway
ReactionReactionReaction
Reactant-2
Product-2
Product-1
in-pathwayleft
right
SRI InternationalBioinformaticsTranscriptional Regulation
site001
pro001
trpE
trpD
trpC
trpB
trpA
trpL
Int003 RpoSig70
TrpR*trpInt001
trpLEDCBA
trp
apoTrpRInt005
SRI InternationalBioinformaticsPrinciple Classes
Class names are capitalized, plural, separated by dashes
Genetic-Elements, with subclasses: Chromosomes Plasmids
Genes Transcription-Units RNAs
rRNAs, snRNAs, tRNAs, Charged-tRNAs Proteins, with subclasses:
Polypeptides Protein-Complexes
SRI InternationalBioinformaticsPrinciple Classes
Reactions, with subclasses: Transport-Reactions
Enzymatic-Reactions
Pathways
Compounds-And-Elements
SRI InternationalBioinformaticsSlots in Multiple Classes
Common-NameSynonyms
CommentCitations
DB-Links
SRI InternationalBioinformaticsGenes Slots
Component-Of (links to replicon, transcription unit)
Left-End-PositionRight-End-PositionCentisome-PositionTranscription-DirectionProduct
SRI InternationalBioinformaticsProteins Slots
Molecular-Weight-SeqMolecular-Weight-Exp
pILocations
Modified-FormUnmodified-Form
Component-Of
SRI InternationalBioinformaticsPolypeptides Slots
Gene
SRI InternationalBioinformaticsProtein-Complexes Slots
Components
SRI InternationalBioinformaticsReactions Slots
EC-Number
Left, Right
DeltaG0Keq
Spontaneous?
SRI InternationalBioinformaticsEnzymatic-Reactions Slots
EnzymeReactionActivatorsInhibitorsPhysiologically-RelevantCofactorsProsthetic-GroupsAlternative-SubstratesAlternative-Cofactors
SRI InternationalBioinformaticsPathways Slots
Reaction-ListPredecessorsPrimaries
SRI InternationalBioinformaticsGKB Editor
Browse class hierarchy and slot definitions
Tools -> Ontology Browser
GKB Editor described at http://www.ai.sri.com/~gkb/user-man.html
Pathway Tools Data Access Mechanisms
SRI InternationalBioinformaticsIntroduction
MANY ways to access and update PGDBs
APIs in Java, Perl, and Lisp
Import/export of files in many formats
Registry of Pathway/Genome Databases
Import PGDB data into BioWarehouse
Updating a PGDB from an external genome DB
SRI InternationalBioinformaticsPathway Tools APIs
Support programmatic queries and updates to PGDBs
APIs in Java, Perl, and Lisp all provide access to a common set of procedures:
Generic Frame Protocol -- Ocelot object database API Additional Pathway Tools functions
For more information see http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
SRI InternationalBioinformaticsGeneric Frame Protocol (GFP)
A library of procedures for accessing Ocelot DBs
GFP specification: http://www.ai.sri.com/~gfp/spec/paper/paper.html
A small number of GFP functions are sufficient for most complex queries
Knowledge of Pathway Tools schema is critical for using the APIs:
Appendix I of Pathway Tools User’s Guide, Vol I
SRI InternationalBioinformaticsGeneric Frame Protocol
get-class-all-instances (Class) Returns the instances of Class
Key Pathway Tools classes: Genetic-Elements Genes Proteins Polypeptides (a subclass of Proteins) Protein-Complexes (a subclass of Proteins) Pathways Reactions Compounds-And-Elements Enzymatic-Reactions Transcription-Units Promoters DNA-Binding-Sites
SRI InternationalBioinformaticsGeneric Frame Protocol
Notation Frame.Slot means a specified slot of a specified frame
get-slot-value(Frame Slot) Returns first value of Frame.Slot
get-slot-values(Frame Slot) Returns all values of Frame.Slot as a list
slot-has-value-p(Frame Slot) Returns T if Frame.Slot has at least one value
member-slot-value-p(Frame Slot Value) Returns T if Value is one of the values of Frame.Slot
print-frame(Frame) Prints the contents of Frame
Note: Frame and Slot must be symbols!
SRI InternationalBioinformaticsGeneric Frame Protocol
coercible-to-frame-p (Thing) Returns T if Thing is the name of a frame, or a frame object
save-kb Saves the current KB
SRI InternationalBioinformaticsGeneric Frame Protocol –
Update Operations
put-slot-value(Frame Slot Value) Replace the current value(s) of Frame.Slot with Value
put-slot-values(Frame Slot Value-List) Replace the current value(s) of Frame.Slot with Value-List, which must be a list of values
add-slot-value(Frame Slot Value) Add Value to the current value(s) of Frame.Slot, if any
remove-slot-value(Frame Slot Value) Remove Value from the current value(s) of Frame.slot
replace-slot-value(Frame Slot Old-Value New-Value) In Frame.Slot, replace Old-Value with New-Value
remove-local-slot-values(Frame Slot) Remove all of the values of Frame.Slot
SRI InternationalBioinformatics
Additional Pathway Tools Functions –Semantic Inference LayerSemantic inference layer defines built-in
functions to compute commonly required relationships in a PGDB
http://bioinformatics.ai.sri.com/ptools/ptools-fns.html
SRI InternationalBioinformaticsInternal note
Note: Refer to local copy of ptools-fns.html to go through the semantic inference layer fns
SRI InternationalBioinformaticsFile Import/Export Capabilities
PGDBs can be exported in whole or part to: SBML – Systems Biology Markup Language – sbml.org
Import supported by many simulation packages File -> Export -> Selected Reactions to SBML File
Pathway Tools Attribute-Value format and column-delimited format files
http://brg.ai.sri.com/ptools/flatfile-format.shtml Dump entire PGDB to a suite of files: File -> Export -> Entire DB to Flat
Files Dump selected frames to a single file: File -> Export -> Selected Frames
to File
SRI InternationalBioinformaticsImport/Export
Import from attribute-value or column-delimited files File -> Import -> Frames From File
Import/Export to/from internal Pathway Tools format that allows pathways, reactions, enzymes, and compounds to be easily moved between Pathway Tools installations
Edit -> Add Pathway to File Export List File -> Export -> Selected Pathways to File File -> Import -> Pathways from File
Import/Export to/from MDL molfile format Edit -> Import compound structure from molfile Edit -> Export compound structure to molfile
SRI InternationalBioinformaticsMiscellaneous Exports
Overview -> Highlight -> Save to File Overview -> Highlight -> Load from File
Gene / Protein Sequence / Save to file Chromosome -> Show Sequence of a Segment of Replicon
SRI InternationalBioinformaticsNapster Comes to
Bioinformatics
Public sharing of Pathway/Genome Databases
PGDB registry maintained by SRI at URL http://biocyc.org/registry.html
Registry operations List contents of registry Download PGDBs listed in the registry Register PGDBs you have created
SRI InternationalBioinformaticsRegistry Details
Why register your PGDB? Declare existence of your PGDB in a central location Facilitate download by other scientists
Why download a PGDB? Desktop Navigator provides more functionality than Web Comparative operations Programmatic querying and processing of PGDB
Registration process Registered PGDBs have open availability by default Authors can provide their own license agreements Registered PGDBs reside on authors’ FTP site
SRI InternationalBioinformaticsBioWarehouse
Biospice.org
SRI InternationalBioinformaticsNew Import/Export Tools
Suggestions?
Volunteers?
SRI InternationalBioinformaticsUpdating a PGDB From an
External Genome DB
Example: AraCyc forms a pathway module to the TAIR DB
TAIR is authoritative source for gene and gene-product information
Update AraCyc to reflect updates in TAIR
SRI InternationalBioinformaticsProposed Approach
Export TAIR to PathoLogic files Build AraCyc2 from those PathoLogic files – automated
PathoLogic only
Compare AraCyc1 (A1) to AraCyc2 (A2)A. Import new genes/proteins from A2 to A1B. Delete from A1 genes/proteins not found in A2C. Rename genes/proteins whose names changed from A2 to A1 Run name matcher on A1’ Check for pathways with no enzymes and report them so user can keep any that
otherwise PathoLogic will delete What about enzymes that were assigned to a pathway by the hole filler?
Re-run pathway predictor Remember what pathways user deletes so they are not re-predicted by
PathoLogic
Consider movement of genes from contig to chromosome