39
The Ontology of the Gene Ontology Barry Smith http://ifomis.de Jennifer Williams http://ontologyworks.com Steffen Schulze-Kremer http://ifomis.de

The Ontology of the Gene Ontology

Embed Size (px)

DESCRIPTION

The Ontology of the Gene Ontology. Barry Smith http://ifomis.de Jennifer Williams http://ontologyworks.com Steffen Schulze-Kremer http://ifomis.de. The Prime Directive. - PowerPoint PPT Presentation

Citation preview

Page 1: The Ontology of the Gene Ontology

The Ontology of the Gene Ontology

Barry Smithhttp://ifomis.de

Jennifer Williamshttp://ontologyworks.com

Steffen Schulze-Kremerhttp://ifomis.de

Page 2: The Ontology of the Gene Ontology

http:// ifomis.de 2

The Prime Directive

As the right of each sentient species to live in accordance with its normal cultural evolution is considered sacred, no Star Fleet personnel may interfere with the healthy development of alien life and culture. Such interference includes the introduction of superior knowledge, strength, or technology to a world whose society is incapable of handling such advantages wisely.

Page 3: The Ontology of the Gene Ontology

http:// ifomis.de 3

The Bioinformatics Prime Directive

no computer scientist may interfere with the information resources provided by biologists

Page 4: The Ontology of the Gene Ontology

http:// ifomis.de 4

The Story of GONG

Computer scientists develop

browsers,

query-interfaces,

tools for statistical analysis or for cross-ontology mapping

which take the biological information as something inviolable

Page 5: The Ontology of the Gene Ontology

http:// ifomis.de 5

IFOMIS: Renegade StarTroop

Institute for Formal Ontology and Medical Information Science

Faculty of Medicine

University of Leipzig

http:// ifomis.de

Page 6: The Ontology of the Gene Ontology

http:// ifomis.de 6

The Gene Statistic

The Gene Ontology

Page 7: The Ontology of the Gene Ontology

http:// ifomis.de 7

GO: the Gene Ontology

3 large telephone directories of standardized designations for gene functions and products

designed to cover the whole of biology

model for

fungal ontology,

plant ontology,

drosophila ontology,

etc.

Page 8: The Ontology of the Gene Ontology

http:// ifomis.de 8

Primary aim of GO

not rigorous definition and principled classification

but rather: providing a practically useful framework for keeping track of the biological annotations that are applied to gene products

Thesis: GO can realize its goal more adequately (and avoid many coding errors) by taking ontology (especially the logic of classifications and definitions) seriously

Page 9: The Ontology of the Gene Ontology

http:// ifomis.de 9

GO: the Gene Ontology

GO divided into 3 separate hierarchies each organized via is_a and part_of

Page 10: The Ontology of the Gene Ontology

http:// ifomis.de 10

Problems with is_a

A is_a B = every instance of A is an instance of B

Page 11: The Ontology of the Gene Ontology

http:// ifomis.de 11

Problems with is_a

Holliday junction helicase complex is_a

unlocalized

protein storage vacuole is_a

vacuole (sensu Streptophyta)

Page 12: The Ontology of the Gene Ontology

http:// ifomis.de 12

Problems with part_of

‘part_of’ = ‘can be part of’ (flagellum part_of cell)

‘part_of’ = ‘is sometimes part of’ (replication fork part_of the nucleoplasm)

‘part_of’ = ‘is included as a sublist in’

Page 13: The Ontology of the Gene Ontology

http:// ifomis.de 13

GO divided into three disjoint term hierarchies

cellular component ontology

molecular function ontology

biological process ontology

flagellum, chromosome, cell

ice nucleation, binding, protein stabilization

glycolysis, death

Page 14: The Ontology of the Gene Ontology

http:// ifomis.de 14

three separate hierarchies

= no is_a and no part_of relations defined between them

PUZZLE: How are the classes in the three separate hierarchies linked together?

cellular component ontology

molecular function ontology

biological process ontology

Page 15: The Ontology of the Gene Ontology

http:// ifomis.de 15

Component

Component is easy to understand:

A component is a 3-dimensional entity which endures through time

Page 16: The Ontology of the Gene Ontology

http:// ifomis.de 16

Process

Process is easy to understand:

A process is an occurrent entity = an entity which unfolds itself through time in successive temporal parts

Page 17: The Ontology of the Gene Ontology

http:// ifomis.de 17

What is a function?

Page 18: The Ontology of the Gene Ontology

http:// ifomis.de 18

Definition of «Function»

UMLS Semantic Network:

Functional Concept =df A concept which is of interest because it pertains to the carrying out of a process or activity.

GO:

Molecular Function =df the action characteristic of a gene product.

Page 19: The Ontology of the Gene Ontology

http:// ifomis.de 19

How are the 3 ontologies related?

Function = “the action characteristic of a gene product”

Process = “phenomenon marked by changes that lead to a particular result, mediated by one or more gene products”

NO PART-WHOLE RELATIONS BETWEEN FUNCTION AND PROCESS ONTOLOGIES

Page 20: The Ontology of the Gene Ontology

http:// ifomis.de 20

The True Story about Process and Function

A process is an occurrent entity

A component is a continuant entity

Page 21: The Ontology of the Gene Ontology

http:// ifomis.de 21

The True Story about Function and Process

A process is an occurrent entity

A component is an independent continuant entity

There are also dependent continuant entities:

qualities, roles, dispositions, powers …

and functions

Page 22: The Ontology of the Gene Ontology

http:// ifomis.de 22

The function of your heart is: to pump blood

This function endures through time and gets exercised.

This function exists even when it is not being exercised

The exercise of a function is a process

Page 23: The Ontology of the Gene Ontology

http:// ifomis.de 23

Functions exist even when they are not being expressed

Functions exist even when there is no functioning

Page 24: The Ontology of the Gene Ontology

http:// ifomis.de 24

Constitiuent-Process-Function

Processes depend on constituents

Processes realize functions

Constituents have functions

Page 25: The Ontology of the Gene Ontology

http:// ifomis.de 25

Dependent continuants are realized through occurrent processes

the exercise of a function

the performance of a role

the execution of a plan

the application of a therapy

the realization of a disposition

the course of a disease

Page 26: The Ontology of the Gene Ontology

http:// ifomis.de 26

GO:

“A biological process is accomplished via one or more ordered assemblies of molecular functions.”

Page 27: The Ontology of the Gene Ontology

http:// ifomis.de 27

But no:

“GO molecular functions are occurrent rather than continuant. The terminology we've used to date is, I agree, confusing but the activities described in the molecular function ontology are events -- they represent the function as it is exercised rather than the potential to exercise that function.”

Page 28: The Ontology of the Gene Ontology

http:// ifomis.de 28

“The defintions you cite are certainly inconsistent with this at the moment, but this is a temporary situation. … true path violations … do crop up fairly regularly, but are always fixed.”

Page 29: The Ontology of the Gene Ontology

http:// ifomis.de 29

Confusion of Function and Activity

If function = activity (= functioning)

how can GO deal with dormant/suppressed functions?

How can GO deal with the relation of expression which involves a function and its exercise?

Page 30: The Ontology of the Gene Ontology

http:// ifomis.de 30

A step towards clarity

On March 2003 (nearly) all nodes in the Molecular Function ontology (except the root) had ‘activity’ added to their names

Function = activity

How does ‘process’ relate to ‘activity’

Page 31: The Ontology of the Gene Ontology

http:// ifomis.de 31

GO’s answer

“A biological process is accomplished via one or more ordered assemblies of molecular functions.”

BUT: there are no part-whole relations across ontologies

Result: constant coding errors resulting from lack of clear principles as concerns what the basic notions of ‘function’ and ‘process’ mean

Page 32: The Ontology of the Gene Ontology

http:// ifomis.de 32

Examples of GO Molecular Functions

anti-coagulant activity (defined as: “a substance that retards or prevents coagulation”)

enzyme activity (defined as: “a substance that catalyzes”)

structural molecule (defined as: “the action of a molecule that contributes to structural integrity”)

Page 33: The Ontology of the Gene Ontology

http:// ifomis.de 33

GO:0005199: structural constituent of cell wall

Definition: The action of a molecule that contributes to the structural integrity of a cell wall.

confuses constituents with actions, which GO includes in its function ontology.

Page 34: The Ontology of the Gene Ontology

http:// ifomis.de 34

extracellular matrix structural constituent + puparial glue (sensu Diptera) structural constituent of bonestructural constituent of chorion (sensu Insecta) structural constituent of chromatin structural constituent of cuticle + structural constituent of cytoskeleton structural constituent of epidermis + structural constituent of eye lens structural constituent of muscle structural constituent of myelin sheath structural constituent of nuclear pore structural constituent of peritrophic membrane (sensu

Insecta) structural constituent of ribosome structural constituent of tooth enamel structural constituent of vitelline membrane (sensu

Insecta)

Page 35: The Ontology of the Gene Ontology

http:// ifomis.de 35

Problems caused by lack of intuitive formal understandings of

its basic ontological terms

The need for expert knowledge places severe obstacles in the way of using GO as a basis for computer applications

computers do not have access to expert biological knowledge

Page 36: The Ontology of the Gene Ontology

http:// ifomis.de 36

As GO increases in size and scope

it will “be increasingly difficult to maintain the semantic consistency we desire without software tools that perform consistency checks and controlled updates”.

The addition of each new term will require the curator to understand the entire structure of GO in order to avoid redundancy and to ensure that all appropriate linkages are made with other terms.

Page 37: The Ontology of the Gene Ontology

http:// ifomis.de 37

Benefits of the GO Approach

1) Work on populating GO could start immediately, without its authors needing to solve some of the intricate problems which face ontologies when formalized as logical theories.

2) Populating GO does not require the completion of complex protocols of formally determined steps but can be done intuitively by the expert biologist.

3) There are few formal constraints standing in the way of easy incorporation of existing controlled vocabularies from the biological domain.

Page 38: The Ontology of the Gene Ontology

http:// ifomis.de 38

Drawbacks

1) It is unclear what kinds of reasoning are permissible on the basis of GO’s hierarchies.

2) The rationale of GO’s subclassifications is unclear.

3) No procedures are offered by which GO can be validated.

4) There are insufficient rules for determining how to recognize whether a given concept is or is not present in GO.

Page 39: The Ontology of the Gene Ontology

http:// ifomis.de 39

GO DOES NOT COMPUTE

Solution:

Rebuild from scratch before it is too late

MANGO