93
Research… this time it’s Personal! ilkinson, PI Bioinformatics, Heart + Lung Institute @ St. Paul’s Ho Vancouver, BC, Canada

C:\fakepath\bioit world2010

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: C:\fakepath\bioit world2010

Research… this time it’s Personal!

Mark Wilkinson, PI Bioinformatics, Heart + Lung Institute @ St. Paul’s HospitalVancouver, BC, Canada

Page 2: C:\fakepath\bioit world2010

DEMO

...to give you incentive to listen to the rest of the presentation

;-)

Page 3: C:\fakepath\bioit world2010

Show me patients with elevated creatinine along with their latest BUN and creatinine levels

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patients: <http://sadiframework.org/ontologies/patients.owl#> PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {

?patient rdf:type patients:ElevatedCreatininePatient .?patient pred:latestBUN ?bun . ?patient pred:latestCreatinine ?creat .

}

Page 4: C:\fakepath\bioit world2010

VOILA!

Page 5: C:\fakepath\bioit world2010

There was no database...

There was no patient data anywhere annotated as

“Elevated Creatinine Patient”

Page 6: C:\fakepath\bioit world2010

How did I answer a question where the required data didn’t exist?

(...no, I didn’t just make it up! LOL!)

Page 7: C:\fakepath\bioit world2010

The story begins...

Page 8: C:\fakepath\bioit world2010

This is going to hurt...

Page 9: C:\fakepath\bioit world2010

Web Services

vs.

Semantic Web

Page 10: C:\fakepath\bioit world2010

Web Servicesare not “connected” to

the Semantic Web

Why?

Page 11: C:\fakepath\bioit world2010

Web ServicesXML + XML Schema

Semantic WebRDF + OWL

Page 12: C:\fakepath\bioit world2010

Web ServicesPOST of SOAP-XML

Semantic WebGET of RDF-XML

Page 13: C:\fakepath\bioit world2010

Web ServicesNo (rigorous) semantics

Semantic WebRich, flexible semantics

Page 14: C:\fakepath\bioit world2010

Web Services&

Semantic Web

Fundamentally and deeply different Web technologies!

Page 15: C:\fakepath\bioit world2010
Page 16: C:\fakepath\bioit world2010

>1000 X more data in the “Deep Web” than in Web pages

Page 17: C:\fakepath\bioit world2010

Accessing these databases and

analytical algorithms “transparently”,

based on an individual researcher’s

ideas, beliefs, and preferences

will help us personalize

medical research

Page 18: C:\fakepath\bioit world2010

Mark Butler (2003) Is the semantic web hype? Hewlett Packard laboratories presentation at MMU, 2003-03-12

Page 19: C:\fakepath\bioit world2010

Semantic Web?(my definition)

An information system where machines can receive information from one source, re-interpret it, and correctly use it for a purpose that the source had

not anticipated.

Page 20: C:\fakepath\bioit world2010

Re-interpretation

Correct re-use

Both are critical to

the personalization

of research

Page 21: C:\fakepath\bioit world2010

Building a

personalized Semantic Web…

Step-by-step…

Page 22: C:\fakepath\bioit world2010

Founding partner

Semantic Automated Discovery and Integrationhttp://sadiframework.org

(open source)MicrosoftResearch

Page 23: C:\fakepath\bioit world2010

SADI

“best-practices” for Semantic Web Service provision

Page 24: C:\fakepath\bioit world2010

standards-compliant

Page 25: C:\fakepath\bioit world2010

Lightweight(only 2 “rules”)

Page 26: C:\fakepath\bioit world2010

Rules come from observations:

Page 27: C:\fakepath\bioit world2010

SADI Observation #1:

Web Services in Bioinformatics create implicit biological relationships

between their input and output

Page 28: C:\fakepath\bioit world2010

SADI Observation #1:

Page 29: C:\fakepath\bioit world2010

SADI Best Practice #1

Make the implicit explicit…

A Web Service should create “triples” linking the input data to the output data, thus explicitly describing the semantic

relationship between them

Page 30: C:\fakepath\bioit world2010

SADI Best Practice #1

This is what bioinformatics Web Services implicitly do anyway!

Easy to implement this as a best-practice

Page 31: C:\fakepath\bioit world2010

SADI Observation #2:HTTP GET and POST

GET guarantees the response relates to the request URI

in a very precise and predictable way

POST does not…

Page 32: C:\fakepath\bioit world2010

SADI Observation #2:GET and POST

That’s why Web Services have a fundamentally different behaviour than the Semantic Web

Page 33: C:\fakepath\bioit world2010

SADI Observation #2:GET and POST

We can fix that!

(without breaking any existing rules or standards!)

Page 34: C:\fakepath\bioit world2010

SADI Best Practice #2

SUBJECT URI of the output graph (triples) is the same

as the SUBJECT URI of the input graph (triples)

(the output is “about” the input... Now explicitly!)

Page 35: C:\fakepath\bioit world2010

Consequence

The “Semantics” of our interaction with the Web Service are now

explicit and identical to the “Semantics” of GET

Page 36: C:\fakepath\bioit world2010

SADI Web Service Interfaces

Service Interfaces defined by two OWL classes:

Page 37: C:\fakepath\bioit world2010

SADI Web Service Interfaces

OWL Class #1: My Input Class

Page 38: C:\fakepath\bioit world2010

SADI Web Service Interfaces

OWL Class #2: My Output Class

Page 39: C:\fakepath\bioit world2010

SADI Web Service Interfaces

My Service consumes OWL Individuals of Class #1

and returns OWL Individuals of Class #2

…but the URI of those two individuals is the same!(see best practice #2)

Page 40: C:\fakepath\bioit world2010

How do we discover services?

Since input and output are about the same “thing”,

we can automatically determine

what a service does

by comparing

the Input and Output OWL classes

Page 41: C:\fakepath\bioit world2010

Automatically index services in a registry based

on what properties (predicates) Services add to

their respective input data

How do we discover services?

Page 42: C:\fakepath\bioit world2010

EXAMPLE

Input Data: BRCA1 rdf:type Gene ID

Output Data: BRCA1 hasDNASequence AGCTTAGCCA…

Registry Index: Service provides “hasDNASequence” property to Gene IDs

Page 43: C:\fakepath\bioit world2010

Now we can answer questions like

“what is the DNA sequence of BRCA1?”

Discover a SADI Web Service that generates the

DNA Sequence property for gene identifiers

Page 44: C:\fakepath\bioit world2010

Okay, enough tech gobbledygook

What will this do for ME?

Page 45: C:\fakepath\bioit world2010

Demo #1

Page 46: C:\fakepath\bioit world2010

Imagine there is a “virtual database” containing all of the data from all of the databases,together with the output of

every conceivable analysis

Page 47: C:\fakepath\bioit world2010

How do we query that database?

Page 48: C:\fakepath\bioit world2010

“SHARE”

Semantic Health And Research Environment

SADI client application

Page 49: C:\fakepath\bioit world2010
Page 50: C:\fakepath\bioit world2010

What pathways does UniProt protein P47989 belong to?

PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE {

uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .

}

Page 51: C:\fakepath\bioit world2010
Page 52: C:\fakepath\bioit world2010
Page 53: C:\fakepath\bioit world2010
Page 54: C:\fakepath\bioit world2010

Recapwhat we just saw

A standard SPARQL query was entered into SHARE, a SADI-aware query engine

Page 55: C:\fakepath\bioit world2010

Recapwhat we just saw

The query was interpreted to extract the properties being queried and these were passed to SADI for Web Service discovery

Page 56: C:\fakepath\bioit world2010

Recapwhat we just saw

SADI searched-for, found, and accessed all databases and/or analytical tools capable

of generating those properties

Page 57: C:\fakepath\bioit world2010

Recapwhat we just saw

We posed, and answered a complex database query

WITHOUT A DATABASE

(in fact, the data didn’t even have to exist...)

Page 58: C:\fakepath\bioit world2010

Cool!

Page 59: C:\fakepath\bioit world2010

…but I’m supposed to be personalizing research…

Let’s make this a little more personal by bringing in Ontologies

Page 60: C:\fakepath\bioit world2010

My Definition of Ontology (for this talk)

Ontologies explicitly define the things that exist in “the world”

based on what properties each kind of thing must have

Page 61: C:\fakepath\bioit world2010

Ontology Spectrum

Catalog/ID

SelectedLogical

Constraints(disjointness,

inverse, …)

Terms/glossary

Thesauri“narrowerterm”relation

Formalis-a

Frames(Properties)

Informalis-a

Formalinstance

Value Restrs.

GeneralLogical

constraints

Page 62: C:\fakepath\bioit world2010

Demo #2Discover instances of OWL classes

from data that doesn’t exist…

Page 63: C:\fakepath\bioit world2010

Show me patients with elevated creatinine along with their latest BUN and creatinine levels

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patients: <http://sadiframework.org/ontologies/patients.owl#> PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {

?patient rdf:type patients:ElevatedCreatininePatient .?patient pred:latestBUN ?bun . ?patient pred:latestCreatinine ?creat .

}

Page 64: C:\fakepath\bioit world2010

Start burrowing through the OWL class find that we need aregression model OWL class

Page 65: C:\fakepath\bioit world2010

Regression models have features like slopes and intercepts… and so onThe class is completely decomposed until a set of required Services are discoveredcapable of creating all these necessary properties

Page 66: C:\fakepath\bioit world2010

Successful decomposition of the OWL class to discover the need for a LinearRegression Web Service, and so on

Page 67: C:\fakepath\bioit world2010

VOILA!

Page 68: C:\fakepath\bioit world2010

OWL Class restrictions converted into workflows

SPARQL queries converted into workflows

Reasoning happening in parallel with query execution

Data fulfilling OWL models is discovered,

or generated through running analytical tools

SADI and CardioSHARE

Page 69: C:\fakepath\bioit world2010

I still don’t seewhy this is“Personal”

??

Page 70: C:\fakepath\bioit world2010

Show me patients whose creatinine level is increasing over time, along with their latest BUN and creatinine

SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {

?patient rdf:type patients:ElevatedCreatininePatient .?patient pred:latestBUN ?bun . ?patient pred:latestCreatinine ?creat .

}

Page 71: C:\fakepath\bioit world2010

I created a small ontologydescribing my definition of

an Elevated Creatinine Patient

Page 72: C:\fakepath\bioit world2010

… it was MY ontology!

Page 73: C:\fakepath\bioit world2010

I can re-use it

Page 74: C:\fakepath\bioit world2010

I can modify it as I change my world-view

Page 75: C:\fakepath\bioit world2010

I can publish it for others to use

Page 76: C:\fakepath\bioit world2010

Others can modify it to fit THEIR world-view

Page 77: C:\fakepath\bioit world2010
Page 78: C:\fakepath\bioit world2010

My personal world-view is being dynamically resolved against

global data and knowledge

Page 79: C:\fakepath\bioit world2010

…but it’s bigger than that…

Page 80: C:\fakepath\bioit world2010

“Elevated Creatinine Patient”

Page 81: C:\fakepath\bioit world2010

I made that up! It came out of my head!

Page 82: C:\fakepath\bioit world2010

What’s a fancy word for a world-view that you make-up?

Hypothesis

Page 83: C:\fakepath\bioit world2010

Current Research

We believe that ontologies and hypotheses are, in some ways, the same “thing”…

…simply assertions about individuals that may or may not exist

Future SADI client applications will supportdata-driven hypothesis generation and resolution

Page 84: C:\fakepath\bioit world2010

Recap

SADI Semantic Web Services generate triples; the predicates of those triples are indexed... Period.

For a given query, determine which properties are available, and which need to be

discovered/generated

Find services that generate the properties we need

Page 85: C:\fakepath\bioit world2010

Semantic Web

An information system where machines can receive information from one source, re-interpret it, and correctly use it for a purpose that the source had

not anticipated.

My Purpose!!

Page 86: C:\fakepath\bioit world2010

What SADI + SHARE supports

Re-interpretation

We constantly compare the collection of properties, gathered from third-parties worldwide, to whatever world-model (query/ontology) we wish to view it

through.

MY world model

Page 87: C:\fakepath\bioit world2010

Novel re-use

There is no way for the provider to dictate how their data should be used, or how it should be interpreted. They simply add

their properties into the “data cloud” and those properties are used in whatever

way is appropriate for ME.

What SADI + SHARE supports

Page 88: C:\fakepath\bioit world2010

And all this because SADI simply requires

that the input URI

is the same

as the output URI

Page 89: C:\fakepath\bioit world2010

Semi-automated SADI service writing and deployment

Taverna

Semantically-guided SADI service discovery and pipelining

SADI Plug-ins

Page 90: C:\fakepath\bioit world2010

Simple and Open WINS!Join us!

We have recently received funding from CANARIEto assist and train service providers

in deploying their own SADI Semantic Web Services

Come join us – we’re having a lot of fun!!

http://sadiframework.orghttp://twitter.com/sadiframework

Page 91: C:\fakepath\bioit world2010

Fin

This presentation available on SlideShare: keywords ‘wilkinson’ ‘BioIT-2010’

Page 92: C:\fakepath\bioit world2010

C r e d i t s

B e n j a m i n V a n d e r V a l k ( S A D I & C a r d i o S H A R E )

L u k e M c C a r t h y ( S A D I & C a r d i o S H A R E )

S o r o u s h S a m a d i a n ( C a r d i o S H A R E )

Page 93: C:\fakepath\bioit world2010

Microsoft Research