The Gene Ontology & Gene Ontology Annotation resources

Preview:

Citation preview

The Gene Ontologyand Gene Ontology Annotation resources

Mélanie Courtot, Ph.D.EMBL-EBIGO/GOA Project leaderSPOT/UniProt content teamsmcourtot@ebi.ac.uk

Industry workshopMarch 17 2016

In 1999, collaboration between 3 Model

Organism Databases

Ashburner et al., Nat Genet. 2000 May;25(1):25-9.

• A way to capture biological knowledge for individual gene productsin a written and computable form

• A set of concepts and their relationships to each other arrangedas a hierarchy

http://www.ebi.ac.uk/QuickGO

Less specific concepts

More specific concepts

The Gene Ontology

1. Molecular FunctionAn elemental activity or task or job

• protein kinase activity• insulin receptor activity

3. Cellular ComponentWhere a gene product is located

• mitochondrion

• mitochondrial matrix

• mitochondrial inner membrane

2. Biological ProcessA commonly recognized series of events

• cell division

Provide a public resource of data and tools

Annotate gene products using ontology terms

Develop the ontology

Aims of the GO project

Develop the ontology• An OWL ontology of >41,000 classes

• biological process, cellular component, molecular function

• > 14,000 imported classes (CL, Uberon, ChEBI, NCBI_tax)

• >136,000 logical axioms, including:• ~72,000 subClassOf axioms between named GO

classes• ~41,000 simple existential restrictions (subClassOf R

some C)• EL expressivity => fast, scalable reasoning (with

ELK)https://www.cs.ox.ac.uk/isg/tools/ELK/

Building the GO• The GO editorial team• Submission via GitHub,

https://github.com/geneontology/• Submissions via TermGenie, http://

go.termgenie.org• ~80% terms are now created this way

Annotate gene products

gene -> GO term

associated genes

GO

Database

genome and protein databases

…a statement that a gene product;

P00505

Accession Name GO ID GO term name Reference Evidence code

IDAPMID:2731362aspartate transaminase activityGO:0004069GOT2

A GO annotation is …

…a statement that a gene product;

1. has a particular molecular function or is involved in a particular biological process

or is located within a certain cellular component

A GO annotation is …

P00505

Accession Name GO ID GO term name Reference Evidence code

IDAPMID:2731362aspartate transaminase activityGO:0004069GOT2

…a statement that a gene product;

1. has a particular molecular function or is involved in a particular biological process

or is located within a certain cellular component

2. as described in a particular reference

A GO annotation is …

P00505

Accession Name GO ID GO term name Reference Evidence code

IDAPMID:2731362aspartate transaminase activityGO:0004069GOT2

…a statement that a gene product;

1. has a particular molecular function or is involved in a particular biological process

or is located within a certain cellular component

2. as described in a particular reference

3. as determined by a particular method

A GO annotation is …

P00505

Accession Name GO ID GO term name Reference Evidence code

IDAPMID:2731362aspartate transaminase activityGO:0004069GOT2

Experimental data

Computational analysis

Author statements/curator inference

(+ Inferred from electronic annotations)

http://www.evidenceontology.org/

Tracking provenance

Manual annotations

• Time-consuming process producing lower numbers of annotations (~2,800 taxons covered)

• More specific GO terms• Manual annotation is

essential for creating predictions

AleksandraShypitsyna

ElenaSperetta

AlexHolmes

TonySawford

Electronic Annotations• Quick way of producing large numbers of

annotations• Annotations use less-specific GO terms• Only source of annotation for ~438,000 non-

model organism species

orthology taxon constraints

* Includes manual annotations integrated from external model organism and specialist groups

2,752,604Manual annotations*

269,207,317Electronic annotations

Provide a public resource of data and toolsNumber of annotations in UniProt-GOA database (March 2016)

http://www.ebi.ac.uk/GOA

https://www.ebi.ac.uk/QuickGO/

Enrichment analysisSample Reference

40%20%

20%20%

=> The sample is over-enriched for

Spinocerebellar ataxia type 28

PaolaRoncaglia

Novel biomarkers of rectal radiotherapy

Biomarker for diagnosis and prognosis

Gene expression changes in diabetes

Improved network analysis

25

Many gene products are associated with a large number of descriptive, leaf GO nodes:

GO slims

…however annotations can be mapped up to a smaller set of parent GO terms:

GO slims

Slim generation for industry• Collaboration funded by Roche• Need a custom GO slim for analysis of genesets of

interest• Need to be descriptive enough• Without redundancy

• Internal proprietary vocabulary – hard to maintain• Desire to automatically map to GO

http://www.swat4ls.org/wp-content/uploads/2015/10/SWAT4LS_2015_paper_44.pdf

ROCHE CVGSEA with full GO GSEA with Roche CV

Courtesy Laura Badi

• Mapping query: participant_OR_reg_participant some cannabinoid

• Description: “A process in which a cannabinoid participates, or that regulates a process in which a cannabinoid participates.”

Results• We have successfully mapped 84% of terms from

RCV (308/365) to OWL queries that can be used to replicate some proportion of the original manual mapping.

• In addition, these queries find 1000s of terms that were missed in the original mapping.

David Osumi-Sutherland

GO SLIM (generic)

ROCHE CV – MANUAL ONLY

ROCHE CV MANUAL + AUTO

Acknowledgements

• GO editors and developers• GO annotators• The Gene Ontology (GO) Consortium• Samples, Phenotype and Ontology team (Helen Parkinson)• Protein Function Content team (Claire O’Donovan)• Funding: EMBL-EBI, National Human Genome Research

Institute (NHGRI)

Recommended