53
Triples for the People (Scientists): Liberating biological knowledge with the Semantic Web 1 Ottawa/Chicago Semantic Web Meetup : 23-11-09 Michel Dumontier, Ph.D. Associate Professor of Bioinformatics Carleton University Department of Biology School of Computer Science Institute of Biochemistry Ottawa Institute of Systems Biology Ottawa-Carleton Institute of Biomedical Engineering

Triples for the People (Scientists):  Liberating biological knowledge with the Semantic Web

Embed Size (px)

DESCRIPTION

The Semantic Web is an emerging web of knowledge. It provides the basis upon which we can publish, share and link data, and perhaps more saliently, to use computers to reason about increasingly complex information using background knowledge. From the dream to using triples as a currency to pay for it, this talk will illustrate the application of Semantic Web technologies for biological knowledge discovery while touching on issues in knowledge representation, RDFizing, large scale data integration and convergence with semantic web services.

Citation preview

Page 1: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

1

Triples for the People (Scientists): Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-09

Michel Dumontier, Ph.D.Associate Professor of Bioinformatics

Carleton University

Department of BiologySchool of Computer Science

Institute of BiochemistryOttawa Institute of Systems Biology

Ottawa-Carleton Institute of Biomedical Engineering

Page 2: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-092 Carole Goble (ISWC 2005)

Web-based Knowledge Discovery a very painful process

Page 3: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-093

With current web search engines…It takes a lot of digging to get answers

Page 4: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-094

Portals provide structured informationand give better results

Page 5: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-095

Surface web:167 terabytes

Deep web:91,000 terabytes

545-to-one

We need to expose the deep web

Page 6: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-096

Data silos – not made for sharing

Page 7: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-097

We want to simultaneously

query the 1000+ biological databases

Page 8: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-098

How do we integrate these resources?

Page 9: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-099

The Semantic Web is a web of knowledge.

It is about standards for publishing, sharing and querying knowledge drawn from diverse sources

It enables the answering of sophisticated questions

Page 10: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0910

A growing web of linked data

Page 11: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0911

Bio2RDF provides a framework to glue to link data networks together

Page 12: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0912

Resource Description Framework (RDF)

Uniform Resource Identifier (URI) can be used as entity names

http://bio2rdf.org/uniprot:P05067

is a name for Amyloid precursor protein

http://bio2rdf.org/omim:104300

is a name for Alzheimer disease

uniprot:P05067

omim:104300

Allows one to talk about anything

Page 13: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0913

Resource Description Framework (RDF)

Protein

is a

A RDF statement consists of:– Subject: resource identified by a URI– Predicate: resource identified by a URI– Object: resource or literal

uniprot:P05067

Allows one to express statements

Page 14: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0914

Multi-Source Data Integration

uniprot:P05067 Membrane

Proteinis a

located in

uniprot:P05067

uniprot:P05067 uniprot:P05067interacts with

UniProt

Gene Ontology

uniprot:P05067

has name

located in

interacts with

Unified view

+

+

iRefIndex

depends on consistent naming

Membrane

Protein

uniprot:P05067

Page 15: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0915

Building statements creates knowledge

uniprot:P05067

Protein

is a

omim:104300

Disease

is a

is involved in

Amyloid precursor

protein

label

AlzheimerDisease

label

Page 16: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-091616

RDF/XML<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:u="http://purl.uniprot.org/uniprot/"

<rdf:Description rdf:about=“&u;Q16665"> <rdf:type rdf:resource=“&u;Protein"/> </rdf:Description></rdf:RDF>

PREFIX u: <http://purl.uniprot.org/uniprot/> .

<u:Q16665> a <u:Protein> .

RDF/N3

RDF has multiple representations

Page 17: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0917

Bio2RDF’s RDFized data fits together

Page 18: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0918

Bio2RDF serves up over 4 billion triples of linked biological data

Page 19: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0919

something you can lookup or search for with rich descriptions

Page 20: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0920

Bio2RDF: Raw Data!

Page 21: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0921

SPARQL is the new cool kid on the block

SQL SPARQL

Page 22: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0922

Bio2RDF’s describe service uses SPARQL

CONSTRUCT {?s ?p ?o .

}WHERE {?s ?p ?o .FILTER(?s = <http://bio2rdf.org/ns:id>).

}

Sent to http://ns.bio2rdf.org/sparql?query=...

http://bio2rdf.org/ns:id

Page 23: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0923

Bio2RDF’s search service uses SPARQLhttp://bio2rdf.org/search/hexokinase

kegguniprot

gene

bio2rdf.org

Page 24: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0924

Yai for data!

But how do we discover more than what was in the data?

Page 25: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0925

Ontology as Strategy

Page 26: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0926

uniprot:P05067

Protein

is a

Molecule

is a

is a

Reasoning and Inference through Semantics

fact

ontology

Knowledge base

Page 27: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0927

Logic Based Ontologies Are Conceptual Lego

Page 28: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0928

A simple ontology: Animals

Living Thing

Grass

Animal

Plant

Tree

Body Part

Arm

Leg

Person

Cow

Carnivore

Herbivore

eats

eats

eatshas part

Page 29: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0929

The Web Ontology Language (OWL) Has Explicit Semantics

Can therefore be used to capture knowledge in a machine understandable way

Page 30: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0930

• Subsumption is the primary axis (relationship) in OWL• Superclass/subclass relationship, “is a”• All members of a subclass must be members of its superclasses

• All Proteins are also Molecules• Protein is a subclass of Molecule• Molecule is a superclass of Protein• Molecule subsumes Protein

owl:Thing superclass of all Classes

Protein

Molecule

Key idea: Subsumption

Page 31: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0931

Key Idea: Disjunction

Stating that 2 classes are disjoint means

DNA

= individual

Something cannot be both an Protein and DNA

Protein

This can help us find errors

Page 32: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

32

Transcription Factor

“A protein that binds to DNA and regulates gene expression.

Ottawa/Chicago Semantic Web Meetup : 23-11-09

By stating the necessary and sufficient conditions we discover new knowledge

Key Idea: Class equivalence

Page 33: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0933Barry smith

Many ontologies required

Page 34: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Over 170 bio-ontologies

Ottawa/Chicago Semantic Web Meetup : 23-11-0934

Page 35: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

We’re interested in Personalized Medicine

The ability to offer • The Right Drug• To The Right Patient• For The Right Disease• At The Right Time• With The Right Dosage

Genetic and metabolic data will allow drugs to be tailored to patient subgroups

35 Ottawa/Chicago Semantic Web Meetup : 23-11-09

Page 36: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0936

PHARMGKB is an emerging resource for pharmacogenomics

+ Role of genes, gene variants , drugs + pharmacokinetics + pharmacodynamics + clinical outcomes. + Links to publications

- Natural language descriptions- Variant details in publications

Page 37: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0937

contains statements from 11/40 relevant publications involving 45 genes / gene variants, 57 drugs annotated with 19 classes of antidepressants, 45 drug treatments, 47 drug-gene interactions, 29 clinical outcomes, 10 drug-induced side-effects, and 8 gene-disease interactions.

PHARMACOGENOMICS OF DEPRESSION KNOWLEDGE BASE

Page 38: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0938

Nortriptyline induced side effects for ABCB1 gene variants

‘side effect’ that ‘is realized by’ some (‘drug treatment’ that ‘involves’ some ‘nortriptyline’ and

‘involves’ some (‘variant of’ some ‘ABCB1’))

QUERYING THE PDKBProtégé 4, FaCT++, DL Query Tab

postural hypotension is a side effect of nortriptyline treatment of depression for individuals presenting the 3435C>T genotype

Page 39: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0939

Web-based Knowledge DiscoverySome of our queries need services

Page 40: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0940

The Holy Grail:

Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels.

Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.

Page 41: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0941

Page 42: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0942

Semantic Automated Discovery and Integration

http://sadiframework.org

Mark Wilkinson, UBCMichel Dumontier, Carleton UniversityChristopher Baker, UNB

Page 43: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0943

As OWL Axioms

HomologousGeneImage is owl:equivalentTo {

Gene Q hasImage image P

Gene Q hasSequence Sequence Q

Gene R hasSequence Sequence R

Sequence Q similarTo Sequence R

Gene R = “my gene of interest” }

Page 44: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0944

Build aknowledge basefrom a series of questions

Page 45: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0945

You want to join the knowledge web

Page 46: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0946

Share your data

Page 47: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0947

Bridge your data with others in semantic communities

Page 48: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0948

Time-sensitive or frequently updated data is one way to encourage more visits.

Page 49: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0949

Page 50: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0950

Page 51: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0951

The Knowledge Web

• Merging data & services

• Reasoning & question answering

• Persistent (RESTful)

• Trust & Security

Data consumers must be able to rely upon your data to use it as a foundation for their own applications.

Page 52: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0952

Join the knowledge web.

Page 53: Triples for the People (Scientists):   Liberating biological knowledge with the Semantic Web

Ottawa/Chicago Semantic Web Meetup : 23-11-0953

[email protected]