48
Gene Onotology Part 1: what is the GO? http://www.geneontology.org Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics Bar Harbor, ME

Gene Onotology Part 1: what is the GO? Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Embed Size (px)

Citation preview

Page 1: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Gene Onotology Part 1: what is the GO?

http://www.geneontology.org

Harold J DrabkinSenior Scientific CuratorThe Jackson Laboratory

Mouse Genome InformaticsBar Harbor, ME

Page 2: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

What is the GOThe scope of the GOThe GO Relationships

Using the GO for annotationAnatomy of an annotationEvidence codesqualfiersgene association files

Page 3: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

What IS the GO

The Gene Ontology is a dictionary of concepts used to describe the normal properties of a gene product

It has concepts describing molecular functionsIt has concepts describing biological processesIt has concepts describing cellular locations that

the gene products are found in

Page 4: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Gene Ontology

Built for a very specific purpose:“annotation of genes and proteins in

genomic and protein databases”Built to be applicable to any organismFormed to develop a shared language adequate for the annotation of molecular characteristics across organisms; a common language to share knowledge.

Page 5: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

The GO

is NOT list of genes or proteinsalthough you might find a synonym as a gene or

protein name does NOT track diseases

although certain disease phenotypes might suggest the function of a gene product or a process that it may participate in

you will not find “tumor suppressor activity/tumor suppression” as GO terms

Page 6: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

The Gene Ontology Consortium Started Small

Original GO created in 2000Three databases involved:

FlyBase (Drosophila)MGI (Mouse)SGD (S. cerevisae)

Used immediately

aa

www.geneontology.org

Page 7: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

More quickly joined...

Later databases:TAIR (Arabadopsis)TIGR (microbes including prokaryotes)SWISS-PROT (several thousand species inc. human)PSU (P. falciparum)

ZFIN (zebrafish)PAMGO (plant pathogens)

Page 8: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

8

Gene Ontology widely adopted

AgBase

Page 9: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Why do we need this?

Page 10: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Tactition Tactile senseTaction

perception of touch ; GO:0050975

Often the same term is referred to differently

Page 11: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Bud initiation?

Of then the same term is used by different communities to mean different things...

Page 12: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

More specifically

The GO is not just a flat list of terms

transcription factor activityDNA bindingtranscription regulator activitymembranemitochondrial membraneglycolysisnucleuscytoplasmion transport.....

transcription factor activityDNA bindingtranscription regulator activitymembranemitochondrial membraneglycolysisnucleuscytoplasmion transport.....

Page 13: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

is_a

And the terms can have more than one parent!

is_aDNA binding is a type of nucleic acid binding.

Nucleic acid binding is atype of binding.

There are also relationships between them.

Page 14: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Ontology StructureThe Gene Ontology is structured as a hierarchical directed acyclic graph (DAG)

Terms can have more than one parent and zero, one or more children

Terms are linked by three relationshipsis-apart-ofregulates (new)

negatively regulatespositively regulates is_a part_of

Page 15: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Ontology Structurecell

membrane chloroplast

mitochondrial chloroplastmembrane membrane

is-apart-of

Page 16: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

http://www.ebi.ac.uk/ego

It gets complicated quickly

Page 17: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Molecular Function = elemental activity/task

the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity

Biological Process = biological goal or objectivebroad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions

Cellular Component = location or complexsubcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme

The 3 Gene Ontologies

Page 18: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Cellular Component where a gene product acts

Page 19: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Molecular Functionactivities or “jobs” of a gene product

glucose-6-phosphate

isomerase activity insulin binding

insulin receptor activity

A gene product may have several functions; a function term refers to a single reaction or activity, not a gene product.

Sets of functions make up a biological process.

Page 20: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Biological Process

gluconeogenesis

cell division

limb development

a commonly recognized series of events

Page 21: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Mitochondrial P450 (CC24 PR01238; MITP450CC24)

An example…

Page 22: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Anatomy of a GO term

A GO term obo format stanza

begins with [Term] and minimally has

id:

name:

namespace

def

one or more relationships

Page 23: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

More GO Term Stanzas

Page 24: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

24

The Regulates Relationship

Page 25: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

In the Beginning There Were Two Relationships

Is_a: denotes a subtype of its parent.Part_of: denotes a portion of a parent

Is_part: If it exists, it is always a part of its parent (this is the relationship we use).

Has_part: If there is a parent, then it has this as a part of it.

Page 26: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

We made the regulation of something a part_of the something

But it’s not really part_of

Page 27: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

So, what’s the issue with regulates?

Regulation is not always an inherent part of the process that it regulates

A speed-bump regulates the velocityof my car

50 mph 5 mph

Page 28: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

We needed a better way to express ‘regulates’

We defined regulation as “any process that modulates the frequency, rate or extent of

something.

Something can be:

• A Biological Process• A Molecular Function• A Biological Quality

Page 29: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

A ‘decomposed’ Term

[Term] id: GO:0000019 name: regulation of mitotic recombination namespace: biological_process def: "Any process that modulates the frequency, rate or extent of DNA recombination during mitosis." [GOC:go_curators] synonym: "regulation of recombination within rDNA repeats" NARROW [] is_a: GO:0000018 ! regulation of DNA recombination intersection_of: GO:0065007 ! biological regulation intersection_of: regulates GO:0006312 ! mitotic recombination relationship: regulates GO:0006312 ! mitotic recombination

The intersection tags make up the logical definition. This places the ‘regulation’term in the context of mitotic recombination.

Page 30: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

The context of mitotic recombination

Page 31: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Old regulation of mitotic recombination’ part of the graph on top of ‘mitotic recombination’

Page 32: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Now

regulates

regulates

regulates

Page 33: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

What does this buy us?

The new relationship portrays the biology more accurately than part_of

RegulatesPositively rgulatesNegatively regulates

The new logical definitions allow automated consistency checks as the ontology is developed.The first implementation of cross-products in GOSets the stage for:

Molecular function -> biological processCell type -> biological processChebi -> biological process

Page 34: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics
Page 35: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

On March 18th 2008)[Term] id: GO:0000019 name: regulation of mitotic recombination namespace: biological_process def: "Any process that modulates the frequency, rate or extent of DNA recombination during mitosis." [GOC:go_curators] narrow_synonym: "regulation of recombination within rDNA repeats" [] is_a: GO:0000018 ! regulation of DNA recombination relationship: part_of GO:0006312 ! mitotic recombination

[Term] id: GO:0000019 name: regulation of mitotic recombination namespace: biological_process def: "Any process that modulates the frequency, rate or extent of DNA recombination during mitosis." [GOC:go_curators] synonym: "regulation of recombination within rDNA repeats" NARROW [] is_a: GO:0000018 ! regulation of DNA recombination intersection_of: GO:0065007 ! biological regulation intersection_of: regulates GO:0006312 ! mitotic recombination relationship: regulates GO:0006312 ! mitotic recombination

Page 36: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Evolution of GOGO term development was annotation-driven

Development directed by use: Terms added as new species annotatedTerms added on as as-needed basis

Developed by an international consortium of biologists and computer scientists

members from individual databasescentral office at EBI

Development involves collaboration with domain experts from different biological fields

also formal ontologists

Page 37: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Important Consideration for Users

The GO changes dailynew terms addedadditional relationships addedterms removed: obsoletes terms

Page 38: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

GO Slims

Page 39: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

What is a GO Slim

A GO Slim is a smaller slice of the GO that can be used to “bin” data into categories relevant to the user's experiment

Why use this?

you want to group several sections of the GO into a single broader category

you want to remove sections that are totally irrelevant for your assay (eg, photosynthetic processes irrelevant for birds).

Page 40: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Several GO Slims are referenced in the gene_ontology.obo file

Section of OboEdit showing GO slims built into theontology

Page 41: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

But you can build your own

In OboEdit, select the Category Manager (under Metadata)

Use “add” to add a new one; I am adding one for translation

Page 42: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Now I browse through the GO, selecting terms and checkingthem in thecatagories box. Make sure you “commit” (save) each selected term.Note, the children of a term are not automatically selected.You need to decide.

After saving in the category manager, the new slimappears in the category list

Page 43: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Checking the “filter terms” box during save will allow you to save just your slim to a new file

Page 44: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Now you can use THIS obo in various binning tools such asGO term finder, Vlad, GO Slimmer, rather than the entire GO

Page 45: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

GO Slimmer tool is part of AmiGO

Page 46: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

You cans specify yourgenes in a number of ways

You can filter on species and evidence code

you can input or choose a GO slim

Page 47: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

You can also select various output options

The gene product counts and a tab-delimited file are great formaking pie or bar charts in Excel!

Page 48: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics

Visit

http://www.geneontology.org

and

http://www.godatabase.org

for more GO Slim help