Intelligence Ontology A Strategy for the Future Barry Smith University at Buffalo

Preview:

Citation preview

Intelligence OntologyA Strategy for the Future

Barry SmithUniversity at Buffalo

http://ontology.buffalo.edu/smith

ncornational center for ontological research

ncornational center for ontological research

The goal of NCOR is to advance ontological research and applications by

promoting the creation and use of high quality ontologies. It thus supports

the development of metrics for ontology evaluation designed to bring about

an evolutionary improvement in ontology quality. NCOR serves as a

vehicle to coordinate, to enhance, to publicize, and to seek funding for

ontology-related activities in fields such as scientific research, intelligence

analysis and multisource information fusion, qualitative spatiotemporal

reasoning, and terminological systems. It organizes conferences and

research groups focusing on specific aspects of ontology development,

facilitates the exchange of research personnel for short- and long-term

visits, and participates in nationally and internationally funded research

networks.

http://ncor.us

OIC-2007 Proceedings:http://ceur-ws.org

Volume 299

OIC-2007 ONTOLOGY FOR THE

INTELLIGENCE COMMUNITYtowards effective exploitation and

integration of intelligence resourcesNOVEMBER 28-29, 2007 · COLUMBIA, MD

ECOR – European Center for Ontological Research

JCOR – Japanese Center for Ontological ResearchInaugural meeting in Tokyo, February 26-27, 2008

NCOR

• Science-based ontology evaluation to

create an evolutionary path towardsontology improvement

• Ontology interoperability

• Promotion of best practice

7

The problem we face in biology...

MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANV

How to do biology across the genome

9

where in the body?where in the cell?what kind of disease process?how was the data collected?

10

ontologies = high quality controlled structured vocabularies for the annotation (description) of data

11

annotating images

MouseEcotope GlyProt

DiabetInGene

GluChem

sphingolipid transporter

activity

The OBO Foundry Idea

MouseEcotope GlyProt

DiabetInGene

GluChem

Holliday junction helicase complex

The OBO Foundry Idea

MouseEcotope GlyProt

DiabetInGene

GluChem

sphingolipid transporter

activity

The OBO Foundry Idea

Broad-coverage semantic annotation systems which will enable intelligent

integration of gigantic bodies of heterogeneous data need to be

created also outside biology.

geospatialtransportreligion

weather

ethnicitychemicalspoliticslaw

using common rules drawing on best practices for creating ontologies

... and for linking ontologies

In areas such as:

geospatialtransportreligionweather

ethnicitychemicalspoliticslaw

exploiting the division of labor

relying on champions in dispersed communities to invest in public-domain resources

We will also need:

ontology of documents

ontology of provenance

ontology of names

ontology of numbers (IDs)

ontology of signatures

ontology of identity

...

Ontology Scope URL Custodians

Cell Ontology (CL)

cell types from prokaryotes to mammals

obo.sourceforge.net/cgi-

bin/detail.cgi?cell

Jonathan Bard, Michael Ashburner, Oliver Hofman

Chemical Entities of Bio-

logical Interest (ChEBI)

molecular entities ebi.ac.uk/chebiPaula Dematos,Rafael Alcantara

Common Anatomy Refer-

ence Ontology (CARO)

anatomical structures in human and model

organisms(under development)

Melissa Haendel, Terry Hayamizu, Cornelius

Rosse, David Sutherland,

Foundational Model of Anatomy (FMA)

structure of the human body

fma.biostr.washington.

edu

JLV Mejino Jr.,Cornelius Rosse

Functional Genomics Investigation

Ontology (FuGO)

design, protocol, data instrumentation, and

analysisfugo.sf.net FuGO Working Group

Gene Ontology (GO)

cellular components, molecular functions, biological processes

www.geneontology.org

Gene Ontology Consortium

Phenotypic Quality Ontology

(PaTO)

qualities of anatomical structures

obo.sourceforge.net/cgi

-bin/ detail.cgi?attribute_and_value

Michael Ashburner, Suzanna

Lewis, Georgios Gkoutos

Protein Ontology (PrO)

protein types and modifications

(under development)Protein Ontology

Consortium

Relation Ontology (RO)

relationsobo.sf.net/

relationshipBarry Smith, Chris

Mungall

RNA Ontology(RnaO)

three-dimensional RNA structures

(under development) RNA Ontology Consortium

Sequence Ontology(SO)

properties and features of nucleic sequences

song.sf.net Karen Eilbeck

20

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Organism-Level Process

(GO)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

Cellular Process

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

The OBO Foundryobofoundry.org

GRANULARITY

RELATION TO TIME

http://obofoundry.org

Ontologies facilitate grouping of annotations

brain 20 hindbrain 15 rhombomere 10

Query brain without ontology 20Query brain with ontology 45

All OBO Foundry ontologieswork in the same way

• we have data (biosample, haplotype, clinical data, survey data, ...)

• we need to make this data available for semantic search and algorithmic processing

• we create a consensus-based ontology for annotating the data

23

24

25

26

to enhance alignment of data about instances (communities, places, ...)

Community / Population Ontology

• family, clan• ethnicity• religion • diet• social networking•education (literacy ...)• healthcare (economics ...)•...

•household forms• demography• public health

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Family, Community, Deme, Population

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Componen

t(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

http://obofoundry.org

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

COMPLEX OFORGANISMS

Family, Community, Deme, Population

OrganFunction

(FMP, CPRO)

Population Phenotype

PopulationProcess

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Componen

t(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

http://obofoundry.org

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

COMPLEX OF ORGANISMS

Family, Community,

Deme, Population OrganFunction

(FMP, CPRO)

Population

Phenotype

Population Process

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

(FMA, CARO)

Phenotypic Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cell Com-

ponent(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

E N

V I R

O N

M E

N T

http://obofoundry.org

OBO FoundryGenomic Standards Consortium

National Environment Research Council (UK)Barcode of Life Project

Encyclopedia of Life Project

The Environment Ontology

Applications of EnvO in Biology

Support the annotation of meta-data related to: Data about biological samples produced from various

technologies• Metagenomics, Metabolomics, Proteomics, Transcriptomics,

Genomics...

Data Produced from remote sensing equipment Images• Web 2.0, tagging

Physical holdings• Museum artifacts, (preserved) biological samples / organisms

...anything that has an environment

How EnvO currently works for information retrieval

Retrieve all experiments on organisms obtained from:• deep-sea thermal vents• arctic ice cores• rainforest canopy• alpine melt zone

Retrieve all data on organisms sampled from:• hot and dry environments• cold and wet environments• a height above 5,000 meters

Retrieve all the omic data from soil organisms subject to:• moderate heavy metal contamination

Environment = totality of circumstances external to a living organism or group of organisms

• pH• evapotranspiration• turbidity• available light• predominant vegetation• predatory pressure• nutrient limitation …

extend EnvO to the clinical domain

neighborhood patterns• built environment, living conditions• climate• social networking• crime, transport• education, religion, work• health, hygiene

disease patterns• bio-environment (bacteriological, ...)• patterns of disease transmission (links to IDO)

works in tandem with

GAZ.obo: An Open Source Gazetteer Constructed on Ontological Principles

http://darwin.nerc-oxford.ac.uk/gc_wiki/index.php/EnvO_Project

The Environment Ontology

Recommended