64
Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Embed Size (px)

Citation preview

Page 1: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Building the Ontology Landscape for Cancer Big Data Research

Barry SmithMay 12, 2015

Page 2: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Addressing cancer big data challenges

Session 1: through imaging ontologies (BS)

Session 2: by capturing metadata for data integration and analysis (Chris Stoeckert)

Session 3: through the Ontology of Disease (Lynn Schriml and Lindsay Cowell)

Public Session: Cancer Big Data to Knowledge (BS)

2

Page 3: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

National Center for Biomedical Ontology (NCBO)

NIH Roadmap Center 2005-2015

Gene OntologySemantic Web

3

NCBO

Page 4: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Old biology data

4

Page 5: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV

New biology data

5

Page 6: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

How to do biology across the genome?

MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVMKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVMKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVMKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV

6

Page 7: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

how to link the kinds of phenomena represented here

7

Page 8: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGELIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGELIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGE

8

to data like this?

Page 9: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Answer

Tag the data with meaningful labels which together form an ontology

~ Semantic enhancement

An ontology is a controlled structured vocabulary to support annotation of data

9

Page 10: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

QuestionsHow to build an ontology?

How to bring it about that all scientists in each domain use the same ontology to annotate their data?

How to bring it about that scientists in neighboring domains use ontologies that are interoperable?

10

Page 11: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

By far the most successful: GO (Gene Ontology)

11

Page 12: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

GO provides a controlled vocabulary of terms for use in annotating (describing, tagging) data

• multi-species, multi-disciplinary, open source

• built by biologists, maintained and improved by biologists

• contributes to the cumulativity of scientific results obtained by distinct research communities

12

Page 13: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

International System of Units (SI)

13

Page 14: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Gene products involved in cardiac muscle development in humans

14

Page 15: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Prerequisites for ontology success

• Aggressive use in tagging data across multiple communities

• Feedback cycle between ontology editors and ontology users to ensure continuous update

• Logically and biologically coherent definitions – logical = to allow computational reasoning and

quality assurance– biological = to ensure consistency between

ontologies

15

Page 16: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

GO is amazingly successful

but it covers only generic biological entities of three sorts:

– cellular components– molecular functions– biological processes

and it does not provide representations of diseases, symptoms, anatomy, pathways, experiments …

16

Page 17: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Ontology success stories, and some reasons for failure

So people started building the needed extra ontologies more or less at random

17

Page 18: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

18

Page 19: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

19

Page 20: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

20

Page 21: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

21

Page 22: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

22

Page 23: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

23

Page 24: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

24

Page 25: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

25

Page 26: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

26

Page 27: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

27

Definition: Reaching a decision through the application of an algorithm designed to weigh the different factors involved.

Page 28: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

28

Definition: Reaching a decision through the application of an algorithm designed to weigh the different factors involved.

Confuses an algorithm with an act of reaching a decision

Defines ‘algorithm’ as a special kind of application of an algorithm. (This is worse than circular.)

Page 29: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

John Fox (Director, OpenClinical)

As a user and teacher of ontological methods in medicine and engineering I have for years warned my students that the design of domain ontologies is a black art with no theoretical foundations and few practical principles.

29

Page 30: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Ontology success stories, and some reasons for failure

Linked Open Data, from Musicbrainz to Mouse Genome Informatics

30

Page 31: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

What are the criteria of success for ontologies in supporting reasoning

over Big Data?1. logically and biologically correct

subsumption hierarchies– correct: Beta cell is_a cell– incorrect: allergy is_a allergy

record in Microsoft Healthvault

31

Page 32: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

John Fox, againAs a user and teacher of ontological methods in medicine and engineering I have for years warned my students that the design of domain ontologies is a black art with no theoretical foundations and few practical principles. … I now have a much more positive story for my students. … In the journey from black art to a truly scientific theory for ontology design this book is an important milestone.

32

Page 33: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

33

Page 34: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)Original OBO Foundry ontologies

(Gene Ontology in yellow) 34

Page 35: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

– CHEBI: Chemical Entities of Biological Interest– CL: Cell Ontology– GO: Gene Ontology– OBI: Ontology for Biomedical Investigations– PATO: Phenotypic Quality Ontology– PO: Plant Ontology– PATO: Phenotypic Quality Ontology– PRO: Protein Ontology– XAO: Xenopus Anatomy Ontology– ZFA: Zebrafish Anatomy Ontology

http://obofoundry.org

35

Page 36: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Anatomy Ontology(FMA*, CARO) Disease Ontology

(OGMS, IDO, HDO, HPO)

Biological Process Ontology (GO)

Cell Ontology(CL)

Subcellular Anatomy Ontology (SAO)

Phenotypic QualityOntology(PATO)

Sequence Ontology (SO) Molecular Function

Ontology(GO)Protein Ontology

(PRO)

Extension Strategy + Modular Organization

top level

mid-level

domain level

INDEPENDENT CONTINUANT

(~THING))

DEPENDENT CONTINUANT(~ATTRIBUTE)

OCCURRENT(~PROCESS)

Basic Formal Ontology (BFO)

36

Page 37: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Example: The Cell Ontology

Page 38: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Organism-Level Process

(GO)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

Cellular Process

(GO)

MOLECULEMolecule

(ChEBI, SO,RNAO, PRO)

Molecular Function(GO)

Molecular Process

(GO)

rationale of OBO Foundry coverage

GRANULARITY

RELATION TO TIME

38

Page 39: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Component(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)Environment Ontology (EnvO)

En

viro

nm

ents

39

Page 40: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

examples of OBO Foundry approach extended into other domains

42

NIF Standard Neuroscience Information Framework

IDO Consortium Infectious Disease Ontology Suite

cROP Common Reference Ontologies for Plants

UNEP Ontology Framework

United Nations Environment Program Ontologies

Page 41: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Common Reference Ontologies for Plants (cROP)

Page 42: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

The second important criterion of ontology success in supporting

reasoning over Big Data is:keeping track of provenance

= recording how data was generated and processed in a way external users can understand, to enhance

• combinability

• reproducibility44

Page 43: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

RELATION TO TIME

CONTINUANT

OCCURRENTGRANULARITY

INDEPENDENTCONTINUANT

DEPENDENT CONTINUANT

ORGAN ANDORGANISM

Organism

NCBITaxonom

y

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO)

Biological Process

(GO)Ontology for Biomedical Investigatio

ns(OBI)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Componen

t(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function

(GO)

Molecular Process

(GO)

Env

iron

men

t Ont

olog

y (E

NV

O)

45

Phe

noty

pic

Qua

lity

(PA

TO

)

Recognizing a new family of protocol-driven processes (investigation, assay, …)

Page 44: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Anatomy Ontology(FMA*, CARO) Disease Ontology

(OGMS, IDO, HDO, HPO)

Bio-logical Process

Protocol-driven

process(OBI)

Cell Ontology(CL)

Subcellular Anatomy Ontology

(SAO)

Phenotypic QualityOntology(PATO)

Sequence Ontology

(SO)

Molecular Function Ontology

(GO)Protein Ontology(PRO)

Extension Strategy + Modular Organization

INDEPENDENT

CONTINUANT(~THING))

DEPENDENT CONTINUANT(~ATTRIBUTE)

OCCURRENT(~PROCESS)

Basic Formal Ontology (BFO)

46

Page 45: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Structure of a typical investigation as viewed by OBI (from http://obi-ontology.org/page/Investigation)

The Ontology for Biomedical Investigations

Page 46: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

RELATION TO TIME

CONTINUANTOCCURRENT

GRANULARITY

INDEPENDENTCONTINUANT

DEPENDENT CONTINUANT

INFORMATION ARTIFACT

ORGAN ANDORGANISM

Organism

NCBITaxonom

y

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO)

IAOSoftware,

Algorithms,…

Sequence Data,

EHR Data …

Biological

Process(GO)

OBICELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Componen

t(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function

(GO)

Images,Image Data,

Flow Cytometry

Data, …

Molecular Process

(GO)OBI:

Imaging

Env

iron

men

t Ont

olog

y (E

NV

O)

48

Phe

noty

pic

Qua

lity

(PA

TO

)

Recognizing a new family of information entities: data, publications, images, algorithms …

Page 47: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Anatomy Ontology(FMA*, CARO) Disease Ontology

(OGMS, IDO, HDO, HPO)

Data Biological Process Assays

Cell Ontology(CL)

Subcellular Anatomy Ontology

(SAO)

Phenotypic QualityOntology(PATO)

Sequence Ontology

(SO)Molecular

Function Ontology(GO)Protein Ontology

(PRO)

Extension Strategy + Modular Organization

INDEPENDENT

CONTINUANT(~THING))

DEPENDENT CONTINUANT(~ATTRIBUTE)

INFORMATION

ARTIFACT (~DATA)

OCCURRENT(~PROCESS)

Basic Formal Ontology (BFO)

49

Page 48: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

50

Even here, things are not as bad as they seem

Page 49: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

51

Page 50: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

52

Page 51: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

53

Page 53: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

IAO = Information Artifact Ontology:

https://code.google.com/p/information-artifact-ontology/

55

Page 54: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

56

http://bioportal.bioontology.org/ontologies/IAO

Page 55: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

A list of ontologies using IAOAdverse Event Reporting Ontology (AERO)Bioinformatics Web Service OntologyBiological Collections Ontology (BCO)Chemical Methods Ontology (CHMO)Cognitive Paradigm Ontology (COGPO)Comparative Data Analysis Ontology Computational Neuroscience OntologyCore Clinical Protocol Ontology (C2PO)Document Act OntologyEagle-I Research Resource Ontology (ERO)The Email OntologyEmotion Ontology (MFOEM)Experimental Factor Ontology (EFO)Exposé OntologyIAO-IntelInfectious Disease Ontology (IDO)Influenza Research Database (IRD)Information Entity OntologyMental Functioning Ontology (MF)

Ontology for Biomedical InvestigationsOntology for Drug Discovery Investigations Ontology for General Medical Science (OGMS)Ontology for Newborn Screening Follow-up and Translational Research (ONSTR)Ontology of Clinical Research (OCRE)Ontology of Data Mining (OntoDM) Ontology of Medically Related Social Entities (OMRSE)Ontology of Vaccine Adverse Events Oral Health and Disease Ontology (OHDO)Population and Community OntologyProper Name OntologySemanticscience Integrated OntologySoftware Ontology (SWO)Translational Medicine Ontology (TMO)Twitter OntologyVaccine Ontology (VO)

Page 56: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Patient Demogra

phics Phenotype

(Disease, …)

Disease process

esData about all of

these things including

image data …algorithms, software,

protocols, …

Instruments, Biomaterials,

FunctionsParameters, Assay types,

Statistics…

Anatomy

Histology

Genotype (GO)

Biological

processes (GO)

Chemistry

INDEPENDENT

CONTINUANT

(~THING))

DEPENDENT

CONTINUANT

(~ATTRIBUTE)

OCCURRENT

(~PROCESS)

IAO OBI

Basic Formal Ontology (BFO)

58 aboutness

Page 57: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Patient Demogra

phics Phenotype

(Disease, …)

Disease process

esData about all of

these things including

image data …algorithms, software,

protocols, …

Instruments, Biomaterials,

FunctionsParameters, Assay types,

Statistics

Anatomy

Histology

Genotype (GO)

Biological

processes (GO)

Chemistry

INDEPENDENT

CONTINUANT

(~THING))

DEPENDENT

CONTINUANT

(~ATTRIBUTE)

OCCURRENT

(~PROCESS)

IAO OBI

Basic Formal Ontology (BFO)

59 biomedical imaging ontology

Page 58: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

The third important criterion of ontology success in supporting

reasoning over Big Data is:use the framework of modular,

general-purpose reference ontologies as starting points for

creating families of purpose-specific application ontologies in ever widening circles (scalability)

60

Page 59: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

BFO

61

Ontology for General Medical Science (OGMS) Cardiovascular Disease OntologyGenetic Disease OntologyCancer Disease OntologyGenetic Disease OntologyImmune Disease OntologyEnvironmental Disease OntologyOral Disease Ontology

Infectious Disease Ontology IDO Staph Aureus IDO MRSA IDO Australian MRSA IDO Australian Hospital MRSA …

Page 60: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015
Page 61: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015
Page 62: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015
Page 63: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Problems with:

Denys-Drash syndrome is_a rare non-neoplastic disorder

1. Denys-Drash syndrome involves nephroblastoma and is therefore neoplastic

2. X is_a rare Y does not track biology

Page 64: Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

What are the criteria of success for ontologies in supporting reasoning

over Big Data?

correct: Beta cell is_a cellincorrect: rare disease is_a disease

If the ontology hierarchy is to support biologically useful reasoning it must track biology

66