17
Creating a … Community Database Organism-Specific Database Model-Organism Database

Creating a … Community Database Organism-Specific Database Model-Organism Database

Embed Size (px)

Citation preview

Page 1: Creating a … Community Database Organism-Specific Database Model-Organism Database

Creating a …

Community DatabaseOrganism-Specific DatabaseModel-Organism Database

Page 2: Creating a … Community Database Organism-Specific Database Model-Organism Database

SRI InternationalBioinformaticsWhy Create a PGDB?

Perform pathway analyses as part of a genome project

Analyze omics data

Create a central information resource for the organism

Create an FBA model

Perform comparative analyses

Page 3: Creating a … Community Database Organism-Specific Database Model-Organism Database

SRI InternationalBioinformaticsModel Organism Databases

DBs that describe the genome and other information about an organism

Curated by experts for that organism No one group can curate all the world’s genomes Distribute workload across a community of experts to create a community

resource

Every sequenced organism with an active experimental community requires a MOD

Integrate genome data with information about the biochemical and genetic network of the organism

Integrate literature-based information with computational predictions

Page 4: Creating a … Community Database Organism-Specific Database Model-Organism Database

SRI InternationalBioinformaticsRationale for MODs

Each “complete” genome is incomplete in several respects:

40%-60% of genes have no assigned function Roughly 7% of those assigned functions are incorrect Many assigned functions are non-specific

MODs are platforms for global analyses of an organism

Interpret omics data in a pathway context In silico prediction of essential genes Characterize systems properties of metabolic and genetic

networks

Page 5: Creating a … Community Database Organism-Specific Database Model-Organism Database

SRI InternationalBioinformaticsWhat is Curation?

Ongoing updating and refinement of a PGDBCorrect false-positive and false-negative

predictionsIncorporate information from experimental

literature Update genome sequence Update gene functions, gene positions, gene names Author comments and citations Add new pathways, modify existing pathways Enter information about regulatory networks

Page 6: Creating a … Community Database Organism-Specific Database Model-Organism Database

SRI InternationalBioinformaticsIssues in Creating Public MODs

Obtaining fundingScoping the projectIdentify user communityObtain buy-in and help from scientific communityIT: Set up database server, Web serverHire and train curators

Page 7: Creating a … Community Database Organism-Specific Database Model-Organism Database

SRI InternationalBioinformaticsQuestions

Do you intend to make your PGDB public and to update it on an ongoing basis?

To create a Model Organism Database?

Page 8: Creating a … Community Database Organism-Specific Database Model-Organism Database

Administering Pathway Tools

Page 9: Creating a … Community Database Organism-Specific Database Model-Organism Database

SRI InternationalBioinformaticsObtaining Pathway Tools

Free to non-commercial organizations

To obtain license agreement go to BioCyc.org and click on Software/Database Download

Follow Installation Guide

ptools-local directory Locate in common directory PGDBs created by all users who use this ptools installation PGDBs downloaded via the registry ptools-init.dat for this ptools installation

Page 10: Creating a … Community Database Organism-Specific Database Model-Organism Database

SRI InternationalBioinformaticsNew Pathway Tools Releases

Major releases = External software releases Twice per year Announced on ptools-users mailing list

Minor releases twice per year affect only our BioCyc.org Web site and flatfile distributions

We support one prior release only Releases announced on [email protected] Read release notes at

http://brg.ai.sri.com/ptools/release-notes.html

Install process: Upgrade schema of your DB (software assisted)

Page 11: Creating a … Community Database Organism-Specific Database Model-Organism Database

SRI InternationalBioinformaticsPGDB Storage:

File or Relational Database

File storage: Advantages:

No RDBMS installation and configuration Disadvantages:

Must be loaded and saved in its entirety No transaction history No concurrent access for multiple users

Oracle/MySQL storage: Advantages:

Faster read access, faster saves Concurrent update access for multiple users Stores history of all PGDB updates

Disadvantages: RDBMS must be installed and configured

Page 12: Creating a … Community Database Organism-Specific Database Model-Organism Database

SRI InternationalBioinformaticsMultiuser Access to PGDBs

PGDB stored within one Oracle or MySQL server

Each curator installs PTools on their workstationDifferent curators can use different software

platformsWorkstations query RDBMS server via internetLocal disk cache speeds accessFor each frame access, PTools queries

In-memory cache, disk cache, RDBMS serverAfter curator saves changes, all changes made by

other users are loaded into curator’s session

Page 13: Creating a … Community Database Organism-Specific Database Model-Organism Database

SRI InternationalBioinformaticsHow to Release a PGDB?

Decide on release frequency and schedule Don’t wait until it’s perfect to release it!

Freeze curation for 1 week Quality assurrance

Run consistency checker Tools -> Consistency Checker Also updates organism-summary statistics

Update publications, authors in organism frame Update via Organism editor

Create new version of PGDB ptools-local/pgdbs/yeastcyc/1.0/kb/yeastbase.ocelot Edit against the new version, release the old version

Author release notes Register PGDB in SRI PGDB registry

Will allow SRI to include it in BioCyc

Page 14: Creating a … Community Database Organism-Specific Database Model-Organism Database

SRI InternationalBioinformaticsPathway Tools Data

Import/Export

File->Export File->Import

Export/import to/from tab-delimited files

Export to Genbank, SBML, BioPAX

Export to attribute-value files

Attribute-value files can be imported into BioWarehouse Relational database system for bioinformatics database integration

Page 15: Creating a … Community Database Organism-Specific Database Model-Organism Database

SRI InternationalBioinformaticsNapster Comes to

Bioinformatics

Public sharing of Pathway/Genome Databases

PGDB registry maintained by SRI at URL http://biocyc.org/registry.html

Registry operations List contents of registry Download PGDBs listed in the registry Register PGDBs you have created

Page 16: Creating a … Community Database Organism-Specific Database Model-Organism Database

SRI InternationalBioinformaticsRegistry Details

Why register your PGDB? Declare existence of your PGDB in a central location Facilitate its download by other scientists Facilitate its inclusion in BioCyc.org

Why download a PGDB? Desktop Navigator provides more functionality than Web Comparative operations Programmatic querying and processing of PGDB

Registration process Registered PGDBs have open availability by default Authors can provide their own license agreements Registered PGDBs reside in authors’ FTP site or HTTP server

Page 17: Creating a … Community Database Organism-Specific Database Model-Organism Database

SRI InternationalBioinformaticsDesktop versus Web Mode

Pathway Tools runs in two different modes: Desktop mode Web mode (e.g., BioCyc.org)

Desktop vs Web functionality in Pathway Tools

http://biocyc.org/desktop-vs-web-mode.shtml

You can run both desktop and web modes at your site

Your PTools web server need not be open to the public