87
Using OWL Domain Models as Abstract Workflow Models Or... Conducting in silico research in the Web from hypothesis to publication Mark Wilkinson Isaac Peral Senior Researcher in Biological Informatics Centro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain Adjunct Professor of Medical Genetics, University of British Columbia Vancouver, BC, Canada.

Web Science - ISoLA 2012

Embed Size (px)

DESCRIPTION

This is a brief version of earlier talks, but I think it might explain more emphatically what I think Web Science is, and why I believe it is realistic, and how SADI/SHARE technologies (or technologies like them) are important to achieve the vision

Citation preview

Page 1: Web Science - ISoLA 2012

Using OWL Domain Models as Abstract Workflow Models

Or...Conducting in silico research in the Web

from hypothesis to publication

Mark Wilkinson

Isaac Peral Senior Researcher in Biological InformaticsCentro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain

Adjunct Professor of Medical Genetics, University of British ColumbiaVancouver, BC, Canada.

Page 2: Web Science - ISoLA 2012

Context

“While it took 2,300 years after the first report of angina for the condition to be commonly taught in medical curricula, modern discoveries are being disseminated at an increasingly rapid pace. Focusing on the last 150 years, the trend still appears to be linear, approaching the axis around 2025.”

The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009

Slide adapted with permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USAJune 22, 2012.

Page 3: Web Science - ISoLA 2012

The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009Slide Borrowed with Permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USAJune 22, 2012.

“The Singularity”

The X-intercept is where, the moment a discovery is made, it is immediately put into practice

(not only medical practice, but any research endeavour...)

Page 4: Web Science - ISoLA 2012

The technology required to achieve this

does not yet exist

Page 5: Web Science - ISoLA 2012

Scientific research would have to be conducted within a medium that

immediately interpreted and disseminated the results...

You Are

Here

Page 6: Web Science - ISoLA 2012

...in a form that immediately (actively!) affected the research of others...

You Are

Here

Page 7: Web Science - ISoLA 2012

...without requiring them to be aware of these new discoveries.

You Are

Here

Page 8: Web Science - ISoLA 2012

To achieve this vision

We must learn how to do research IN the Web

Not OVER the Web

Page 9: Web Science - ISoLA 2012

How we use the Web today

Page 10: Web Science - ISoLA 2012

To achieve this vision

We must learn how to do research IN the Web

Not OVER the Web

Page 11: Web Science - ISoLA 2012

I’d like to show you how close we now are to this vision

and how we got there

Page 12: Web Science - ISoLA 2012

Web Science 2.0

Page 13: Web Science - ISoLA 2012

We wanted to duplicatea real, peer-reviewed, bioinformatics analysis

simply by building a model in the Webdescribing what the answer

(if one existed)

would look like

Page 14: Web Science - ISoLA 2012

...the machine had to make every other decision

on it’s own

Page 15: Web Science - ISoLA 2012

This is the study we chose:

Page 16: Web Science - ISoLA 2012

Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).

Page 17: Web Science - ISoLA 2012

Original Study Simplified

Using what is known about interactions in fly & yeast

predict new interactions with your human protein of interest

Page 18: Web Science - ISoLA 2012

Given a protein P in Species X

Find proteins similar to P in Species Y

Retrieve interactors in Species Y

Sequence-compare Y-interactors with Species X genome

(1) Keep only those with homologue in X

Find proteins similar to P in Species Z

Retrieve interactors in Species Z

Sequence-compare Z-interactors with (1)

Putative interactors in Species X

Abstracted

Page 19: Web Science - ISoLA 2012

Modeling the answer...

OWL

Web Ontology Language (OWL) is the language approved by the W3C

for representing knowledge in the Web

Page 20: Web Science - ISoLA 2012

Modeling the answer...

Note that every word in this diagram is, in reality, a URL (because it is OWL)

Page 21: Web Science - ISoLA 2012

Modeling the answer...

The model of a Potential Interactor is published in The Web

It utilizes concepts from other models published in The Web (ours and other’s) by referencing their URLs

Page 22: Web Science - ISoLA 2012

Modeling the answer...

The model of a Potential Interactor is a network of concepts distributed within the Web

It will be affected by changes to those concepts

We do not “own” all of those concepts!

Page 23: Web Science - ISoLA 2012

ProbableInteractor is homologous to ( Potential Interactor from ModelOrganism1…)

and

Potential Interactor from ModelOrganism2…)

Probable Interactor is defined in OWL as a subclass of Potential Interactor that requires homologous pairs of interacting proteins to exist in both

comparator model organisms.

(Effectively, an intersection)

Modeling the answer...

Page 24: Web Science - ISoLA 2012

Publish our OWL model of a Probable Interactor

in the Web

Page 25: Web Science - ISoLA 2012

In a local data-file

provide the protein we are interested in

and the two species we wish to use in our comparison

taxon:9606 a i:OrganismOfInterest . # humanuniprot:Q9UK53 a i:ProteinOfInterest . # ING1taxon:4932 a i:ModelOrganism1 . # yeasttaxon:7227 a i:ModelOrganism2 . # fly

Running a Web Science 2.0 Experiment

Page 26: Web Science - ISoLA 2012

The tricky bit is...

In the abstract, the search for homology is “generic” – ANY model

organism.

But when the machine attempts to do the

experiment, it will have to use several different and specific resources because our question specifies two different

speciestaxon:4932 a i:ModelOrganism1 . # yeasttaxon:7227 a i:ModelOrganism2 . # fly

Page 27: Web Science - ISoLA 2012

PREFIX i: <http://sadiframework.org/ontologies/InteractingProteins.owl#>

SELECT ?proteinFROM <file:/local/workflow.input.n3>WHERE {

?protein a i:ProbableInteractor . }

This is the question we ask:(the query language here is SPARQL)

The reference (URL) to our OWL model of the answer

Page 28: Web Science - ISoLA 2012

Our system then derives (and executes) the following workflow automatically

These are differentWeb services!

...selected at run-time based on the same model

Page 29: Web Science - ISoLA 2012
Page 30: Web Science - ISoLA 2012

There are three very cool things about what you just saw...

Page 31: Web Science - ISoLA 2012

There are three very cool things about what you just saw...

The system was able to create a workflow based on an OWL model (ontology)

Page 32: Web Science - ISoLA 2012

There are three very cool things about what you just saw...

The system was able to create a COMPUTATIONAL workflow

based on a BIOLOGICAL model

Page 33: Web Science - ISoLA 2012

There are three very cool things about what you just saw...

The workflow it created (i.e. the services chosen)

differed depending on context

taxon:4932 a i:ModelOrganism1 . # yeast

taxon:7227 a i:ModelOrganism2 . # fly

Page 34: Web Science - ISoLA 2012

We got the answer

“simply” by designing a model of the answer!

Page 35: Web Science - ISoLA 2012

How did we do that?

Page 36: Web Science - ISoLA 2012

Design Pattern forWeb Services on the Semantic Web

Page 37: Web Science - ISoLA 2012

A Web application that answers SPARQL-DL queries

Query-answering Enhanced by SADI

Page 38: Web Science - ISoLA 2012

Demos of SADI and SHARE

Page 39: Web Science - ISoLA 2012

What is the phenotype of every allele of the Antirrhinum majus DEFICIENS gene

SELECT ?allele ?image ?desc

WHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image .

?image info:hasDescription ?desc }

Page 40: Web Science - ISoLA 2012

What is the phenotype of every allele of the Antirrhinum majus DEFICIENS gene

SELECT ?allele ?image ?desc

WHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image .

?image info:hasDescription ?desc }

Note that there is no “FROM” clause!We don’t tell it where it should get the information, The machine has to figure that out by itself...

Page 41: Web Science - ISoLA 2012

Enter that query into SHARE

Page 42: Web Science - ISoLA 2012

Click “Submit”...

Page 43: Web Science - ISoLA 2012

SHARE examines available SADI Web Services...and in a few seconds you get your answer.

Page 44: Web Science - ISoLA 2012

The query results are live hyperlinksto the respective Database or images

(the answer is IN the Web!)

Page 45: Web Science - ISoLA 2012

What pathways does UniProt protein P47989 belong to?

PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE {

uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .

}

Page 46: Web Science - ISoLA 2012

What pathways does UniProt protein P47989 belong to?

PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE {

uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .

}

Page 47: Web Science - ISoLA 2012

What pathways does UniProt protein P47989 belong to?

PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE {

uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .

}

Note again that there is no “From” clause…

I have not told SHARE where to look for the answer, I am simply asking my question

Page 48: Web Science - ISoLA 2012

Enter that query into SHARE

Page 49: Web Science - ISoLA 2012
Page 50: Web Science - ISoLA 2012
Page 51: Web Science - ISoLA 2012

Two different providers of gene information (KEGG & NCBI); were found & accessed

Two different providers of pathway information (KEGG and GO); were found & accessed

Page 52: Web Science - ISoLA 2012

The results are all links to the original data(The answer is IN the Web!)

Page 53: Web Science - ISoLA 2012

Show me the latest Blood Urea Nitrogen and Creatinine levelsof patients who appear to be rejecting their transplants

(I showed you this query in ISoLA 2010… sorry for repeating myself )

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {

?patient rdf:type patient:LikelyRejecter .?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat .

}

Page 54: Web Science - ISoLA 2012

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {

?patient rdf:type patient:LikelyRejecter .?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat .

}

Show me the latest Blood Urea Nitrogen and Creatinine levelsof patients who appear to be rejecting their transplants

(I showed you this query in 2010… sorry for repeating myself!)

Page 55: Web Science - ISoLA 2012

Likely Rejecter:

A patient who has creatinine levelsthat are increasing over time

- - Mark D Wilkinson’s definition

Page 56: Web Science - ISoLA 2012

Likely Rejecter:

…but there is no “likely rejecter” column or table in our database…

only blood chemistry measurementsat various time-points

Page 57: Web Science - ISoLA 2012

Likely Rejecter:

So the data required to answer this questionDOESN’T EXIST!

Page 58: Web Science - ISoLA 2012

?

Page 59: Web Science - ISoLA 2012

Enter that query into SHARE

Page 60: Web Science - ISoLA 2012

SHARE “decomposes” theLikely Rejector OWL class

into its constituent property restrictions

Page 61: Web Science - ISoLA 2012

Each property restriction in the Classis matched with a SADI Service

The matched SADI Service can generate data that has that property

Page 62: Web Science - ISoLA 2012

SHARE chains these SADI services are into a workflow...

...the outputs from that workflow are Instances (OWL Individuals) of the Likely Rejector OWL Class

Page 63: Web Science - ISoLA 2012

For example… SHARE utilizes SADI to discover analytical services on the Web that do linear regression analysis;

required for the “increasing over time” part of the Class definition

Page 64: Web Science - ISoLA 2012

VOILA!

Page 65: Web Science - ISoLA 2012

SHARE examines the OWL Class

Gathers, from the Web, the ontologies that are referenced by that Class

then uses those ontological properties to identify which data-sources and analytical

tools it must access to create data matching that Class definition

Page 66: Web Science - ISoLA 2012

OWL

Page 67: Web Science - ISoLA 2012

The way SHARE builds the workflow varies depending on the context of the query

(i.e. which data/ontologies it reads – Mine? Yours?)

and on what part of the query it is trying to answer at any given moment

(which ontological concept is relevant to that clause)

Page 68: Web Science - ISoLA 2012

And that brings us back to...

Page 69: Web Science - ISoLA 2012

Web Science 2.0

Page 70: Web Science - ISoLA 2012

Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).

Page 71: Web Science - ISoLA 2012

derives and executes the following workflow automaticallyusing an OWL ontology that describes the biology

Page 72: Web Science - ISoLA 2012

The analytical tools chosen for that workflow were determined based on

context

even though the biological (ontological) model driving their selection was the

same

Page 73: Web Science - ISoLA 2012

i.e.

The published model is re-usable

Page 74: Web Science - ISoLA 2012

i.e.

The published model is re-usable

In different contexts... by different researchers

Page 75: Web Science - ISoLA 2012

Because the model IS the experiment

the published EXPERIMENT is re-usable!!

Simply point the same query at your own dataset...

Page 76: Web Science - ISoLA 2012

The

scientific publication

is an

executable document!

Page 77: Web Science - ISoLA 2012

Every component of the model

Every component of the input data

Every component of the output data

is a URL

Therefore the model, the question, the experiment, and the results

are inherently IN the Web

Page 78: Web Science - ISoLA 2012

Every component of the model

Every component of the input data

Every component of the output data

is a URL

The answer, and the knowledge derived from it, is immediately available to Web search engines

and moreover, can instantly affect the outcome of other Web Science experiments

Page 79: Web Science - ISoLA 2012
Page 80: Web Science - ISoLA 2012

You Are NowHere!!!

Page 81: Web Science - ISoLA 2012

Change the way we think of “hypotheses”

Page 82: Web Science - ISoLA 2012

In Web Science 2.0

Model what the world would “look like”if your hypothesis were true

Then ask “is there any data that fits that model?”

Page 83: Web Science - ISoLA 2012

Please join us!

SADI and SHARE are Open-Source projects

http://sadiframework.org

Page 84: Web Science - ISoLA 2012

My New Home!

Page 85: Web Science - ISoLA 2012

Luke McCarthy – Lead Dev.Everything...

Benjamin VanderValk SHARE & SADI & Experimental modeling & myHeath Button

Soroush Samadian Cardiovascular data modeling and queries

University of British Columbia

Edward Kawas SADI Service auto-generator

Ian WoodExperimental modeling project

Page 86: Web Science - ISoLA 2012

U of New Brunswick

Dr. Chris BakerAlexandre Riazanov

Carleton University

Dr. Michel DumontierMarc-Alexandre NolinLeonid ChepelevSteve EtlingerNichaella KiethJose Cruz

C-BRASS Collaborators at other sites

Page 87: Web Science - ISoLA 2012

Microsoft Research