Upload
mark-wilkinson
View
238
Download
2
Tags:
Embed Size (px)
DESCRIPTION
This is a brief version of earlier talks, but I think it might explain more emphatically what I think Web Science is, and why I believe it is realistic, and how SADI/SHARE technologies (or technologies like them) are important to achieve the vision
Citation preview
Using OWL Domain Models as Abstract Workflow Models
Or...Conducting in silico research in the Web
from hypothesis to publication
Mark Wilkinson
Isaac Peral Senior Researcher in Biological InformaticsCentro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain
Adjunct Professor of Medical Genetics, University of British ColumbiaVancouver, BC, Canada.
Context
“While it took 2,300 years after the first report of angina for the condition to be commonly taught in medical curricula, modern discoveries are being disseminated at an increasingly rapid pace. Focusing on the last 150 years, the trend still appears to be linear, approaching the axis around 2025.”
The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009
Slide adapted with permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USAJune 22, 2012.
The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009Slide Borrowed with Permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USAJune 22, 2012.
“The Singularity”
The X-intercept is where, the moment a discovery is made, it is immediately put into practice
(not only medical practice, but any research endeavour...)
The technology required to achieve this
does not yet exist
Scientific research would have to be conducted within a medium that
immediately interpreted and disseminated the results...
You Are
Here
...in a form that immediately (actively!) affected the research of others...
You Are
Here
...without requiring them to be aware of these new discoveries.
You Are
Here
To achieve this vision
We must learn how to do research IN the Web
Not OVER the Web
How we use the Web today
To achieve this vision
We must learn how to do research IN the Web
Not OVER the Web
I’d like to show you how close we now are to this vision
and how we got there
Web Science 2.0
We wanted to duplicatea real, peer-reviewed, bioinformatics analysis
simply by building a model in the Webdescribing what the answer
(if one existed)
would look like
...the machine had to make every other decision
on it’s own
This is the study we chose:
Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
Original Study Simplified
Using what is known about interactions in fly & yeast
predict new interactions with your human protein of interest
Given a protein P in Species X
Find proteins similar to P in Species Y
Retrieve interactors in Species Y
Sequence-compare Y-interactors with Species X genome
(1) Keep only those with homologue in X
Find proteins similar to P in Species Z
Retrieve interactors in Species Z
Sequence-compare Z-interactors with (1)
Putative interactors in Species X
Abstracted
Modeling the answer...
OWL
Web Ontology Language (OWL) is the language approved by the W3C
for representing knowledge in the Web
Modeling the answer...
Note that every word in this diagram is, in reality, a URL (because it is OWL)
Modeling the answer...
The model of a Potential Interactor is published in The Web
It utilizes concepts from other models published in The Web (ours and other’s) by referencing their URLs
Modeling the answer...
The model of a Potential Interactor is a network of concepts distributed within the Web
It will be affected by changes to those concepts
We do not “own” all of those concepts!
ProbableInteractor is homologous to ( Potential Interactor from ModelOrganism1…)
and
Potential Interactor from ModelOrganism2…)
Probable Interactor is defined in OWL as a subclass of Potential Interactor that requires homologous pairs of interacting proteins to exist in both
comparator model organisms.
(Effectively, an intersection)
Modeling the answer...
Publish our OWL model of a Probable Interactor
in the Web
In a local data-file
provide the protein we are interested in
and the two species we wish to use in our comparison
taxon:9606 a i:OrganismOfInterest . # humanuniprot:Q9UK53 a i:ProteinOfInterest . # ING1taxon:4932 a i:ModelOrganism1 . # yeasttaxon:7227 a i:ModelOrganism2 . # fly
Running a Web Science 2.0 Experiment
The tricky bit is...
In the abstract, the search for homology is “generic” – ANY model
organism.
But when the machine attempts to do the
experiment, it will have to use several different and specific resources because our question specifies two different
speciestaxon:4932 a i:ModelOrganism1 . # yeasttaxon:7227 a i:ModelOrganism2 . # fly
PREFIX i: <http://sadiframework.org/ontologies/InteractingProteins.owl#>
SELECT ?proteinFROM <file:/local/workflow.input.n3>WHERE {
?protein a i:ProbableInteractor . }
This is the question we ask:(the query language here is SPARQL)
The reference (URL) to our OWL model of the answer
Our system then derives (and executes) the following workflow automatically
These are differentWeb services!
...selected at run-time based on the same model
There are three very cool things about what you just saw...
There are three very cool things about what you just saw...
The system was able to create a workflow based on an OWL model (ontology)
There are three very cool things about what you just saw...
The system was able to create a COMPUTATIONAL workflow
based on a BIOLOGICAL model
There are three very cool things about what you just saw...
The workflow it created (i.e. the services chosen)
differed depending on context
taxon:4932 a i:ModelOrganism1 . # yeast
taxon:7227 a i:ModelOrganism2 . # fly
We got the answer
“simply” by designing a model of the answer!
How did we do that?
Design Pattern forWeb Services on the Semantic Web
A Web application that answers SPARQL-DL queries
Query-answering Enhanced by SADI
Demos of SADI and SHARE
What is the phenotype of every allele of the Antirrhinum majus DEFICIENS gene
SELECT ?allele ?image ?desc
WHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image .
?image info:hasDescription ?desc }
What is the phenotype of every allele of the Antirrhinum majus DEFICIENS gene
SELECT ?allele ?image ?desc
WHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image .
?image info:hasDescription ?desc }
Note that there is no “FROM” clause!We don’t tell it where it should get the information, The machine has to figure that out by itself...
Enter that query into SHARE
Click “Submit”...
SHARE examines available SADI Web Services...and in a few seconds you get your answer.
The query results are live hyperlinksto the respective Database or images
(the answer is IN the Web!)
What pathways does UniProt protein P47989 belong to?
PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE {
uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .
}
What pathways does UniProt protein P47989 belong to?
PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE {
uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .
}
What pathways does UniProt protein P47989 belong to?
PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE {
uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .
}
Note again that there is no “From” clause…
I have not told SHARE where to look for the answer, I am simply asking my question
Enter that query into SHARE
Two different providers of gene information (KEGG & NCBI); were found & accessed
Two different providers of pathway information (KEGG and GO); were found & accessed
The results are all links to the original data(The answer is IN the Web!)
Show me the latest Blood Urea Nitrogen and Creatinine levelsof patients who appear to be rejecting their transplants
(I showed you this query in ISoLA 2010… sorry for repeating myself )
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {
?patient rdf:type patient:LikelyRejecter .?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat .
}
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {
?patient rdf:type patient:LikelyRejecter .?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat .
}
Show me the latest Blood Urea Nitrogen and Creatinine levelsof patients who appear to be rejecting their transplants
(I showed you this query in 2010… sorry for repeating myself!)
Likely Rejecter:
A patient who has creatinine levelsthat are increasing over time
- - Mark D Wilkinson’s definition
Likely Rejecter:
…but there is no “likely rejecter” column or table in our database…
only blood chemistry measurementsat various time-points
Likely Rejecter:
So the data required to answer this questionDOESN’T EXIST!
?
Enter that query into SHARE
SHARE “decomposes” theLikely Rejector OWL class
into its constituent property restrictions
Each property restriction in the Classis matched with a SADI Service
The matched SADI Service can generate data that has that property
SHARE chains these SADI services are into a workflow...
...the outputs from that workflow are Instances (OWL Individuals) of the Likely Rejector OWL Class
For example… SHARE utilizes SADI to discover analytical services on the Web that do linear regression analysis;
required for the “increasing over time” part of the Class definition
VOILA!
SHARE examines the OWL Class
Gathers, from the Web, the ontologies that are referenced by that Class
then uses those ontological properties to identify which data-sources and analytical
tools it must access to create data matching that Class definition
OWL
The way SHARE builds the workflow varies depending on the context of the query
(i.e. which data/ontologies it reads – Mine? Yours?)
and on what part of the query it is trying to answer at any given moment
(which ontological concept is relevant to that clause)
And that brings us back to...
Web Science 2.0
Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
derives and executes the following workflow automaticallyusing an OWL ontology that describes the biology
The analytical tools chosen for that workflow were determined based on
context
even though the biological (ontological) model driving their selection was the
same
i.e.
The published model is re-usable
i.e.
The published model is re-usable
In different contexts... by different researchers
Because the model IS the experiment
the published EXPERIMENT is re-usable!!
Simply point the same query at your own dataset...
The
scientific publication
is an
executable document!
Every component of the model
Every component of the input data
Every component of the output data
is a URL
Therefore the model, the question, the experiment, and the results
are inherently IN the Web
Every component of the model
Every component of the input data
Every component of the output data
is a URL
The answer, and the knowledge derived from it, is immediately available to Web search engines
and moreover, can instantly affect the outcome of other Web Science experiments
You Are NowHere!!!
Change the way we think of “hypotheses”
In Web Science 2.0
Model what the world would “look like”if your hypothesis were true
Then ask “is there any data that fits that model?”
Please join us!
SADI and SHARE are Open-Source projects
http://sadiframework.org
My New Home!
Luke McCarthy – Lead Dev.Everything...
Benjamin VanderValk SHARE & SADI & Experimental modeling & myHeath Button
Soroush Samadian Cardiovascular data modeling and queries
University of British Columbia
Edward Kawas SADI Service auto-generator
Ian WoodExperimental modeling project
U of New Brunswick
Dr. Chris BakerAlexandre Riazanov
Carleton University
Dr. Michel DumontierMarc-Alexandre NolinLeonid ChepelevSteve EtlingerNichaella KiethJose Cruz
C-BRASS Collaborators at other sites
Microsoft Research