47
The Pathway Tools Schema

The Pathway Tools Schema

Embed Size (px)

DESCRIPTION

The Pathway Tools Schema. Motivations for Understanding Schema. Pathway Tools visualizations and analyses depend upon the software being able to find precise information in precise places within a Pathway/Genome DB - PowerPoint PPT Presentation

Citation preview

Page 1: The Pathway Tools Schema

The Pathway Tools Schema

Page 2: The Pathway Tools Schema

SRI InternationalBioinformaticsMotivations for Understanding

Schema

Pathway Tools visualizations and analyses depend upon the software being able to find precise information in precise places within a Pathway/Genome DB

When writing complex queries to PGDBs, those queries must name classes and slots within the schema

A Pathway/Genome Database is a web of interconnected objects; each object represents a biological entity

Page 3: The Pathway Tools Schema

SRI InternationalBioinformaticsReference

Pathway Tools User’s Guide, Volume I Appendix A: Guide to the Pathway Tools Schema

Page 4: The Pathway Tools Schema

SRI InternationalBioinformaticsWeb of Relationships for One

Enzyme

Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2

sdhA sdhB sdhC sdhD

Succinate + FAD = fumarate + FADH2

Enzymatic-reaction

Succinate dehydrogenase

TCA Cycle

Page 5: The Pathway Tools Schema

SRI InternationalBioinformaticsFrame Data Model

Frame Data Model -- organizational structure for a PGDB

Knowledge base (KB, Database, DB)

Frames

Slots

Facets

Annotations

Page 6: The Pathway Tools Schema

SRI InternationalBioinformaticsKnowledge Base

Collection of frames and their associated slots, values, facets, and annotations

AKA: Database, PGDB

Can be stored within An Oracle DB A disk file A Pathway Tools binary program

Page 7: The Pathway Tools Schema

SRI InternationalBioinformaticsFrames

Entities with which facts are associated

Kinds of frames: Classes: Genes, Pathways, Biosynthetic Pathways Instances (objects): trpA, TCA cycle

Classes: Superclass(es) Subclass(es) Instance(s)

A symbolic frame name (id, key) uniquely identifies each frame

Page 8: The Pathway Tools Schema

SRI InternationalBioinformaticsFrame IDs

Naming conventions for frame IDsUniqueness of frame IDs

Frame IDs must be unique within a PGDB Goal: Same frame ID within different PGDBs should refer to

the same biological entity Because many frames are imported from MetaCyc, this helps

ensure consistency of frame names Frame IDs for newly created frames (not imported) are

generated by Pathway Tools Those frame IDs contain a PGDB-specific identifier Example: CPLXzz-nnnn CPLXB3-0035

Page 9: The Pathway Tools Schema

SRI InternationalBioinformaticsSlots

Encode attributes/properties of a frame Integer, real number, string, symbols

Represent relationships between frames The value of a slot is the identifier of another frame

Every slot is described by a “slot frame” in a KB that defines meta information about that slot

Page 10: The Pathway Tools Schema

SRI InternationalBioinformaticsSlot Links

Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2

sdhA sdhB sdhC sdhD

Succinate + FAD = fumarate + FADH2

Enzymatic-reaction

Succinate dehydrogenase

TCA Cycle

product

component-of

catalyzes

reaction

in-pathway

Page 11: The Pathway Tools Schema

SRI InternationalBioinformaticsSlots

Number of values Single valued Multivalued: sets, bags

Slot values Any LISP object: Integer, real, string, symbol (frame name)

Slotunits define properties of slots: datatypes, classes, constraints

Two slots are inverses if they encode opposite relationships

Slot Product in class Genes Slot Gene in class Polypeptides

Page 12: The Pathway Tools Schema

SRI InternationalBioinformaticsRepresentation of Function

Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2

sdhA sdhB sdhC sdhD

Succinate + FAD = fumarate + FADH2

Enzymatic-reaction

Succinate dehydrogenase

TCA Cycle

EC#Keq

CofactorsInhibitors

Molecular wtpI

Left-end-position

Page 13: The Pathway Tools Schema

SRI InternationalBioinformaticsMonofunctional Monomer

Gene

Reaction

Enzymatic-reaction

Monomer

Pathway

Page 14: The Pathway Tools Schema

SRI InternationalBioinformaticsBifunctional Monomer

Gene

Reaction

Enzymatic-reaction

Monomer

Pathway

Reaction

Enzymatic-reaction

Page 15: The Pathway Tools Schema

SRI InternationalBioinformaticsMonofunctional Multimer

Monomer Monomer Monomer Monomer

Gene Gene Gene Gene

Reaction

Enzymatic-reaction

Multimer

Pathway

Page 16: The Pathway Tools Schema

SRI InternationalBioinformaticsPathway and Substrates

Reactant-1

Reaction

Pathway

ReactionReactionReaction

Reactant-2

Product-2

Product-1

in-pathwayleft

right

Page 17: The Pathway Tools Schema

SRI InternationalBioinformaticsTranscriptional Regulation

site001

pro001

trpE

trpD

trpC

trpB

trpA

trpL

Int003 RpoSig70

TrpR*trpInt001

trpLEDCBA

trp

apoTrpRInt005

Page 18: The Pathway Tools Schema

SRI InternationalBioinformaticsPrinciple Classes

Class names are capitalized, plural, separated by dashes

Genetic-Elements, with subclasses: Chromosomes Plasmids

Genes Transcription-Units RNAs

rRNAs, snRNAs, tRNAs, Charged-tRNAs Proteins, with subclasses:

Polypeptides Protein-Complexes

Page 19: The Pathway Tools Schema

SRI InternationalBioinformaticsPrinciple Classes

Reactions, with subclasses: Transport-Reactions

Enzymatic-Reactions

Pathways

Compounds-And-Elements

Page 20: The Pathway Tools Schema

SRI InternationalBioinformaticsSlots in Multiple Classes

Common-NameSynonyms

CommentCitations

DB-Links

Page 21: The Pathway Tools Schema

SRI InternationalBioinformaticsGenes Slots

Component-Of (links to replicon, transcription unit)

Left-End-PositionRight-End-PositionCentisome-PositionTranscription-DirectionProduct

Page 22: The Pathway Tools Schema

SRI InternationalBioinformaticsProteins Slots

Molecular-Weight-SeqMolecular-Weight-Exp

pILocations

Modified-FormUnmodified-Form

Component-Of

Page 23: The Pathway Tools Schema

SRI InternationalBioinformaticsPolypeptides Slots

Gene

Page 24: The Pathway Tools Schema

SRI InternationalBioinformaticsProtein-Complexes Slots

Components

Page 25: The Pathway Tools Schema

SRI InternationalBioinformaticsReactions Slots

EC-Number

Left, Right

DeltaG0Keq

Spontaneous?

Page 26: The Pathway Tools Schema

SRI InternationalBioinformaticsEnzymatic-Reactions Slots

EnzymeReactionActivatorsInhibitorsPhysiologically-RelevantCofactorsProsthetic-GroupsAlternative-SubstratesAlternative-Cofactors

Page 27: The Pathway Tools Schema

SRI InternationalBioinformaticsPathways Slots

Reaction-ListPredecessorsPrimaries

Page 28: The Pathway Tools Schema

SRI InternationalBioinformaticsGKB Editor

Browse class hierarchy and slot definitions

Tools -> Ontology Browser

GKB Editor described at http://www.ai.sri.com/~gkb/user-man.html

Page 29: The Pathway Tools Schema

Pathway Tools Data Access Mechanisms

Page 30: The Pathway Tools Schema

SRI InternationalBioinformaticsIntroduction

MANY ways to access and update PGDBs

APIs in Java, Perl, and Lisp

Import/export of files in many formats

Registry of Pathway/Genome Databases

Import PGDB data into BioWarehouse

Updating a PGDB from an external genome DB

Page 31: The Pathway Tools Schema

SRI InternationalBioinformaticsPathway Tools APIs

Support programmatic queries and updates to PGDBs

APIs in Java, Perl, and Lisp all provide access to a common set of procedures:

Generic Frame Protocol -- Ocelot object database API Additional Pathway Tools functions

For more information see http://bioinformatics.ai.sri.com/ptools/ptools-resources.html

Page 32: The Pathway Tools Schema

SRI InternationalBioinformaticsGeneric Frame Protocol (GFP)

A library of procedures for accessing Ocelot DBs

GFP specification: http://www.ai.sri.com/~gfp/spec/paper/paper.html

A small number of GFP functions are sufficient for most complex queries

Knowledge of Pathway Tools schema is critical for using the APIs:

Appendix I of Pathway Tools User’s Guide, Vol I

Page 33: The Pathway Tools Schema

SRI InternationalBioinformaticsGeneric Frame Protocol

get-class-all-instances (Class) Returns the instances of Class

Key Pathway Tools classes: Genetic-Elements Genes Proteins Polypeptides (a subclass of Proteins) Protein-Complexes (a subclass of Proteins) Pathways Reactions Compounds-And-Elements Enzymatic-Reactions Transcription-Units Promoters DNA-Binding-Sites

Page 34: The Pathway Tools Schema

SRI InternationalBioinformaticsGeneric Frame Protocol

Notation Frame.Slot means a specified slot of a specified frame

get-slot-value(Frame Slot) Returns first value of Frame.Slot

get-slot-values(Frame Slot) Returns all values of Frame.Slot as a list

slot-has-value-p(Frame Slot) Returns T if Frame.Slot has at least one value

member-slot-value-p(Frame Slot Value) Returns T if Value is one of the values of Frame.Slot

print-frame(Frame) Prints the contents of Frame

Note: Frame and Slot must be symbols!

Page 35: The Pathway Tools Schema

SRI InternationalBioinformaticsGeneric Frame Protocol

coercible-to-frame-p (Thing) Returns T if Thing is the name of a frame, or a frame object

save-kb Saves the current KB

Page 36: The Pathway Tools Schema

SRI InternationalBioinformaticsGeneric Frame Protocol –

Update Operations

put-slot-value(Frame Slot Value) Replace the current value(s) of Frame.Slot with Value

put-slot-values(Frame Slot Value-List) Replace the current value(s) of Frame.Slot with Value-List, which must be a list of values

add-slot-value(Frame Slot Value) Add Value to the current value(s) of Frame.Slot, if any

remove-slot-value(Frame Slot Value) Remove Value from the current value(s) of Frame.slot

replace-slot-value(Frame Slot Old-Value New-Value) In Frame.Slot, replace Old-Value with New-Value

remove-local-slot-values(Frame Slot) Remove all of the values of Frame.Slot

Page 37: The Pathway Tools Schema

SRI InternationalBioinformatics

Additional Pathway Tools Functions –Semantic Inference LayerSemantic inference layer defines built-in

functions to compute commonly required relationships in a PGDB

http://bioinformatics.ai.sri.com/ptools/ptools-fns.html

Page 38: The Pathway Tools Schema

SRI InternationalBioinformaticsInternal note

Note: Refer to local copy of ptools-fns.html to go through the semantic inference layer fns

Page 39: The Pathway Tools Schema

SRI InternationalBioinformaticsFile Import/Export Capabilities

PGDBs can be exported in whole or part to: SBML – Systems Biology Markup Language – sbml.org

Import supported by many simulation packages File -> Export -> Selected Reactions to SBML File

Pathway Tools Attribute-Value format and column-delimited format files

http://brg.ai.sri.com/ptools/flatfile-format.shtml Dump entire PGDB to a suite of files: File -> Export -> Entire DB to Flat

Files Dump selected frames to a single file: File -> Export -> Selected Frames

to File

Page 40: The Pathway Tools Schema

SRI InternationalBioinformaticsImport/Export

Import from attribute-value or column-delimited files File -> Import -> Frames From File

Import/Export to/from internal Pathway Tools format that allows pathways, reactions, enzymes, and compounds to be easily moved between Pathway Tools installations

Edit -> Add Pathway to File Export List File -> Export -> Selected Pathways to File File -> Import -> Pathways from File

Import/Export to/from MDL molfile format Edit -> Import compound structure from molfile Edit -> Export compound structure to molfile

Page 41: The Pathway Tools Schema

SRI InternationalBioinformaticsMiscellaneous Exports

Overview -> Highlight -> Save to File Overview -> Highlight -> Load from File

Gene / Protein Sequence / Save to file Chromosome -> Show Sequence of a Segment of Replicon

Page 42: The Pathway Tools Schema

SRI InternationalBioinformaticsNapster Comes to

Bioinformatics

Public sharing of Pathway/Genome Databases

PGDB registry maintained by SRI at URL http://biocyc.org/registry.html

Registry operations List contents of registry Download PGDBs listed in the registry Register PGDBs you have created

Page 43: The Pathway Tools Schema

SRI InternationalBioinformaticsRegistry Details

Why register your PGDB? Declare existence of your PGDB in a central location Facilitate download by other scientists

Why download a PGDB? Desktop Navigator provides more functionality than Web Comparative operations Programmatic querying and processing of PGDB

Registration process Registered PGDBs have open availability by default Authors can provide their own license agreements Registered PGDBs reside on authors’ FTP site

Page 44: The Pathway Tools Schema

SRI InternationalBioinformaticsBioWarehouse

Biospice.org

Page 45: The Pathway Tools Schema

SRI InternationalBioinformaticsNew Import/Export Tools

Suggestions?

Volunteers?

Page 46: The Pathway Tools Schema

SRI InternationalBioinformaticsUpdating a PGDB From an

External Genome DB

Example: AraCyc forms a pathway module to the TAIR DB

TAIR is authoritative source for gene and gene-product information

Update AraCyc to reflect updates in TAIR

Page 47: The Pathway Tools Schema

SRI InternationalBioinformaticsProposed Approach

Export TAIR to PathoLogic files Build AraCyc2 from those PathoLogic files – automated

PathoLogic only

Compare AraCyc1 (A1) to AraCyc2 (A2)A. Import new genes/proteins from A2 to A1B. Delete from A1 genes/proteins not found in A2C. Rename genes/proteins whose names changed from A2 to A1 Run name matcher on A1’ Check for pathways with no enzymes and report them so user can keep any that

otherwise PathoLogic will delete What about enzymes that were assigned to a pathway by the hole filler?

Re-run pathway predictor Remember what pathways user deletes so they are not re-predicted by

PathoLogic

Consider movement of genes from contig to chromosome