22
Curation of the EcoCyc Database: The EcoCyc Update Project Martha Arnaud Scientific Database Curator Bioinformatics Research Group SRI International http://www.ecocyc.org http://www.biocyc.org

Curation of the EcoCyc Database: The EcoCyc Update Project Martha Arnaud Scientific Database Curator Bioinformatics Research Group SRI International

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Curation of the EcoCyc Database:

The EcoCyc Update ProjectMartha Arnaud

Scientific Database CuratorBioinformatics Research Group

SRI International

http://www.ecocyc.orghttp://www.biocyc.org

SRI InternationalBioinformatics

SRI InternationalBioinformaticsEcoCyc Organization

EcoCyc collects information about multiple types of database objects

Pathway * Reaction * Compound * Protein Gene * Transcription Unit

* hierarchies

Proteins

Compounds

Genes

Pathway

Reactions

SRI InternationalBioinformaticsEcoCyc Statistics

176 pathways992 enzymes1006 enzymatic reactions169 transporters828 transcription units1929 proteins have a comment

(598 > 300 characters)

SRI InternationalBioinformaticsEcoCyc Pathway Information

http://biocyc.org:1555/ECOLI/new-image?type=PATHWAY&object=ALANINE-VALINESYN-PWY&detail-level=2

SRI InternationalBioinformaticsEcoCyc Pathway Information

http://biocyc.org:1555/ECOLI/new-image?type=PATHWAY&object=ALANINE-VALINESYN-PWY&detail-level=2

SRI InternationalBioinformatics…viewed with “More Detail”

SRI InternationalBioinformaticsEcoCyc Protein Information

comment

citations

reaction

SRI InternationalBioinformaticsEcoCyc Gene Information

SRI InternationalBioinformaticsEcoCyc Metabolic Overview

http://biocyc.org/ov-expr.shtmlStatic or animated views of expression data

SRI InternationalBioinformaticsEcoCyc Curation

names and synonyms gene classes subunit composition of protein complexes location of gene product protein or complex molecular weight enzyme activity name enzyme properties (activators, inhibitors, cofactors) comment fields evidence citations

reactions catalyzed pathway information

SRI InternationalBioinformatics

Build a new MOD or add a “Pathway Module”!

Pathway Tools Software

- Takes annotated genome- Generates database, including pathway predictions

Freely available (academics/non-profits)

http://bioinformatics.ai.sri.com/ptools/Pathway Tools software environment for creation, curation, analysis, and Web publishing of MODs

[email protected]

Saccharomyces cerevisiae SGD, Stanford UniversityArabidopsis thaliana Carnegie Institution of Washington

Plasmodium falciparum, Stanford UniversityMycobacterium tuberculosis Stanford UniversitySynechocystis Carnegie Institution of WashingtonMethanococcus janaschii EBI

Current Pathway Tools Users

SRI InternationalBioinformaticsEcoCyc Strengths

Metabolism

Transport

Transcription regulation

SRI InternationalBioinformaticsEcoCyc into the Future:

“EcoCyc is not just metabolism anymore!”

…an integrated, review-level information resource on E. coli genomics and biochemistry…

SRI InternationalBioinformatics

What do we need to do? Goals

Can we possibly get it done? Quantification

Where do we start? Priorities

How is it going? Progress

The EcoCyc Update Project:

SRI InternationalBioinformaticsEcoCyc Update: Curation Goals

Expand database scope beyond metabolism, transporters, and transcription

Curate associated reactions and pathways

Stay current with the latest papers

Curate every gene product: literature-based descriptions comprehensive reference lists

SRI InternationalBioinformaticsEcoCyc Update: Quantification

4405 genes-175 transcription factors-168 transporters4062 genes to curate

Full-time curator: 4 days/week on curation+ Part-time curator (70%), years 2-4

Year 1: 1600 hoursYear 2: 3000 hoursYear 3: 3000 hoursYear 4: 3000 hoursTotal: 10,600 hours/4062 genes: 2.6 hours per gene

Curation of abstracts

SRI InternationalBioinformaticsEcoCyc Update: Priorities

1. Problems raised by users and advisors

2. Gene products that have new characterizations published in the literature

3. Gene products that have not yet been thoroughly curated

4. Gene products that have been curated, but have not been updated lately

SRI InternationalBioinformaticsWhere are we now?

807 gene products curated.

807/4062 = 19.9% of the total

(excluding transport and transcription factors)

4-year plan: Curate 615 genes in Year 1

We are meeting our goal!

SRI InternationalBioinformaticsThe EcoCyc Collaboration

SRI

Peter Karp, PI Suzanne Paley, Software

Engineer John Pick, Software Engineer Martha Arnaud, Curator

UCD

John Ingraham, Project Leader

MBL

Monica Riley, Editor Emerita

UNAM Julio Collado-Vides, Project

Leader Socorro Gama-Castro, Curator Martin Peralta, Curator

TIGR Ian Paulsen, Project Leader Mark Hance, Curator

UCSD Milton Saier, Project Leader Can Tran, Curator

Funding:NIH National Center for Research Resources

SRI InternationalBioinformatics

SRI InternationalBioinformatics

Pathway/Genome DBs Created byExternal UsersSaccharomyces cerevisiae, Stanford University

pathway.yeastgenome.org/biocyc/Plasmodium falciparum, Stanford University

plasmocyc.stanford.eduMycobacterium tuberculosis, Stanford University

BioCyc.org

Arabidopsis thaliana and Synechocystis, Carnegie Institution of Washington

Arabidopsis.org:1555

Methanococcus janaschii, EBI Maine.ebi.ac.uk:1555

Other PGDBs in progress by 40 other usersSoftware freely availableEach PGDB owned by its creator