Upload
otylia
View
26
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Feasting on Brains! From Web Services to Web 2.0 to the Semantic Web and back again… A personal journey through the Semantic Web and Web Services for Health Care and Life Sciences Mark Wilkinson ([email protected]) Assistant Professor, Medical Genetics University of British Columbia - PowerPoint PPT Presentation
Citation preview
Feasting on Brains!Feasting on Brains!From Web Services From Web Services to Web 2.0 to Web 2.0 to the Semantic Web to the Semantic Web and back again…and back again…
A personal journey through the Semantic Web and Web Services for Health Care and Life SciencesMark Wilkinson ([email protected])Assistant Professor, Medical GeneticsUniversity of British ColumbiaHeart and Lung Research Institute at St. Paul’s Hospital
Benjamin Good(He’s a “Creep”!)
approach
“Bioinformatics” is a broad fieldand suffers SEVERE interoperability problems
Is it possible to extract the knowledge Required for interoperability from the brains of
bioinformaticians en masse?
As a group, the brains of all bioinformaticians Contain all (known) bioinformatics
“Bioinformaticians” tend to be specialists in a particular domain of computational analysis
“Human Computation”(luis von Ahn)
Ontology Spectrum
Catalog/ID
SelectedLogical
Constraints(disjointness,
inverse, …)
Terms/glossary
Thesauri“narrower
term”relation
Formalis-a
Frames(properties)
Informalis-a
Formalinstance
Value Restrs.
GeneralLogical
constraints
Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness.Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
An ontology is a representation of knowledge
Animal
Mammal
Primate
Lemur HumanZombie
Hair
Brains ChipsShoots
has
eats
is_a
Hair
Hair
Classes, instances
properties, relationships
ClassesAnimal
Mammal
Primate
Lemur HumanZombie
Hair
Brains ChipsShoots
instances
Properties
has
eats
is_a
relations
eats
is_a
has
An ontology is a representation of knowledge
Animal
Mammal
Primate
Lemur HumanZombie
Hair
Brains ChipsShoots
has
eats
is_a
Hair
Hair
HairHair
Classes, instances
properties, relationships
Web Service?
A software tool that is accessible over the Web
Web Services are intended to be accessed by machines, not people.
Interoperability?
The ability of two Web Services to exchange information, and use that information correctly
This generally requires Semantics in the form of Ontologies…
BioMobyBioMobyEating brains to Eating brains to enable Web enable Web Service Service InteroperabilityInteroperability
Mmmm… Mmmm… Brains!!Brains!!
What does BioMoby do?
• Create an ontology of bioinformatics data-types• Define an ontology of bioinformatics operations• Open these ontologies for community input• Define Web Services v.v. these two ontologies
• A Machine can find an appropriate service• A Machine can execute that service unattended• Ontology is community-extensible
The BioMoby PlanThe BioMoby Plan
Gene names
MOBYCentral
MOBY hosts & services
SequenceAlignment SequenceExpress. Protein Alleles…
AlignPhylogenyPrimers
Overview of BioMoby Semantic Interoperability
Overview of BioMoby Semantic Interoperability
Why couldn’t we do this before?
Interoperability
is HARD!
Interoperability throughHuman Computation
BioMoby Data Type Ontology: An explicit list of all biological data-types, and the
relationships between them.
Ontology built, brain by brain, by informaticians!
We achieve interoperability simply because informaticians donate their brain-power
HUMAN COMPUTATION
A portion of the BioMobyOntology
…built from the brains of the community!
……so what can I do with it?so what can I do with it?
Analytical workflow Discovery
No explicit coordination between providers
Run-time discovery of appropriate tools
Automated execution of those tools
The machine “understands” the data you have in-hand, and assists you in choosing the next step in
your analysis.
Interoperability throughHuman Computation
Individuals contributed their knowledge about bioinformatics data-types to
a central ontology
Their combined knowledge enabled the construction of an interoperable framework
Who uses BioMoby?
Usage Statistics
15 Nations
> 60 independent institutions
>1600 interoperable Bioinformatics Resources
~500,000 requests for “brokering” each month
What have we What have we learned?learned?
We can consume We can consume the brains of a the brains of a
large community… large community…
……to generate to generate something complex, something complex,
yet organizedyet organized
Open Open KimonoKimono
The BioMoby ontology is The BioMoby ontology is actually quite messy…actually quite messy…
……communal brains communal brains cancan build useful ontologies, build useful ontologies,
but the problem is…but the problem is…
Ontologies are HARD!
How are ontologies How are ontologies usually constructed?usually constructed?
By small, hard-working, dedicated groups with lots of money!
• Gene Ontology & code– Curated: ~5 full-time staff– ~$25 Million (Lewis,S personal communication)
• NCI Metathesaurus & code– Curated: ~12 full-time staff– ~$15 Million (Peter K. , estimate)
• Health Level 7 (HL7)– Curated– $Lots… Some claim as much as $15 Billion
(Smith, Barry, KBB Workshop, Montreal, 2005)
To build the global Semantic Web for To build the global Semantic Web for Systems Biology we need to encode Systems Biology we need to encode knowledge from EVERY domain of knowledge from EVERY domain of
biology – from barley root apex biology – from barley root apex structure and function, to HIV clinical-structure and function, to HIV clinical-
trials outcomes… and this knowledge is trials outcomes… and this knowledge is constantly changing!constantly changing!
At >$15M each, can we At >$15M each, can we affordafford the the Semantic Web???Semantic Web???
iCAPTUReriCAPTURerexperimentexperiment
Mmmm… Mmmm…
Need MORE Need MORE Brains!!Brains!!
Dr. Bruce McManus with a human heart
in his hands
He knows his hearts…
…but he doesn’tknow how to build
an ontology
What we need
The Problem
The Solution?
The Solution?
So… how do we do it?
Remember what we learnedfrom Moby…
…communities CAN build ontologies!
Building Systems BiologyOntologies through Human Computation
iCAPTURer
Benjamin GoodPh.D. Student, UBC Bioinformatics
Genome BC Better Biomarkers in Transplantation project, St. Paul’s Hospital iCAPTURE Centre
Old Way
• KE drills the brain of one or a very few experts. • Painful, expensive, and time-consuming…
New Way? – the iCAPTURer
• KE creates a clever interface• No direct interaction with expert• Thousands of experts• Cheap Cheap Cheap!
iCAPTURer 1.0
Go to a scientific conference
Text-mine conference abstracts
Auto-Extract concepts
Put concepts into a series ofquestion “templates”
a web interface presents questions about these concepts to conference attendees
Give points for every question they answer
Give a prize to the highest point winner
Results
Is _____ a meaningful term?– Yes, No, I don’t know buttons
What is a synonym for ______– Text entry box
Where does _____ fit in the following tree of related terms?– Clickable tree
Knowledge Points Captured
464
340
207
1011 total
Observations
Yes/No questions work well
Text entry is less effective
Adding to a tree is a disaster!
Competition is a great motivatorfor human computation!
COST?
COST?
COST?
COST?
COST?
< $15,000,000
iCAPTURer 1.5
Start with hypothetical concept tree
Put concepts-concept relations into a series of true/false questions
Make a web interface to ask questions
If a relationship is false, then re-start at the root of the concept tree
Give points for every question they answer
Give a prize to the highest point winner
“Chatterbot”
“I’ve heard that a cardiac myocyte is a type of cardiac cell. Is this true?”
“I’ve heard that STEMI means the same thing as ST Elevated Myocardial Infarction. Is that
nonsense, or is it correct?”
“How do you feel about your mother?”
Results
Knowledge capture in 3 days
>11,000 Concepts
COST
$0
Full details of this experiment are available in:Proceedings of the Pacific Symposium on Biocomputing, 2006
Ontology Quality?
Potential Ontology Evaluation Metrics
• Domain independent– philosophical desiderata– graphical structure– satisfiability
• Domain specific – “Fit” to text– Similarity to a gold
standard– Task-based
– Manual, subjective– Auto, questionable value– Auto, useful, not enough
– Auto, dependent on NLP– Auto/Manual; gold standard
must exist!– Optimal! Auto/Manual, but not
generalizable
“Good”???
What do we mean by “Good”?
Ontology construction is “motivated by the goal of alignment not on concepts but on the universals in reality and thereby also on the
corresponding instances” - Barry Smith
Reality should be the benchmark for the “goodness” of an ontology
ontology evaluation based on referents
in reality
Chosen Philosophical Principle“Epistemology Precedes Ontology”
• A Class should refer to an invariant pattern of properties common among all its instances – Mammals have mammary glands and hair– Humans are an instance of the class Mammal
• Therefore…– If class-instances are mapped into an ontology– Each instance has “properties” or “qualities”– These properties or qualities SHOULD segregate
into different classes if the ontology is any good
Philosophical Desiderata
• Non-vagueness– at least one instance can exist with the Class pattern– Vague class: “mammalian cell wall”
• Non-ambiguity– no more than one common pattern per Class– Ambiguous class: “cell” (e.g. cell phone, jail cell)
• Non-redundancy– within the same level of granularity, no other class
refers to same common properties– Redundant classes: “human”, “homo sapiens”
Cimino, J, 1998
Realist Evaluation: Step 1Table of Instance-Properties
A Instance Char1 Char2 Class B?
I.1 Y N YI.2 Y Y YI.3 N N NI.4 N Y N... ... ... ...(Test one class at a time)
I.1I.2
I.3I.4
CB
Realist Evaluation: Step 2Machine Learning
Instance Char1 Char2 Class B?
I.1 Y N YI.2 Y Y YI.3 N N NI.4 N Y N... ... ... ...
If char1 = YThen Class X
100%
Pattern
Class B score for this pattern
WEKA
Produced by Waikato University in New Zealand
An open source library containing implementations of hundreds of machine learning algorithms(rule learners, LDA, SVM, neural networks... )
Realist Evaluation
0.35
0.10.92
Instance
Char1Char
2Class
1?
I.1 Y N Y
I.2 Y Y Y
I.3 N N N
I.4 N Y N
... ... ... ...
Class Scorefor
Each Class
Realist Evaluation - positive control
1. Identify an ontology that already has logical constraints on properties of a classes.
2. Assemble instances that have those properties
3. Classify the instances with a reasoner
4. Remove class restrictions from the ontology, but keep instances assigned to their classes
5. Look for patterns of instance properties
6. If successful, patterns should be detected
7. The higher the pattern score, the “gooder” the ontology is
Positive Control: Phosphabase
•An ontology describing different classes of phosphatase enzymes.
•Given the domain composition of a protein, phosphatase class can be inferred automatically.
Wolstencraft et al (2006) Protein classification using ontology classification Bioinformatics. Vol. 22 no. 14, pages 530–538
Remove the Logical Rules
• Remove the defining rules for each class
• Maintain the classified instances
• Execute the realist evaluation
• Can we re-discover the patterns that the logical class-rules used to dictate?
Realist Evaluation Positive Control
•25 classes from phosphabase tested on 700 simulated protein instances
•21 - pattern correctly identified for 100% of instances
•For 4 others, patterns identified covering 99, 92, 82, 82% of instances respectively.
Realist Evaluation Positive Control
•So the Phosphabase ontology is “good”
•We can detect strong patterns of properties in its instances that follow the philosophical desiderata
•This is unsurprising, since we knew that it was “good” in the first place…
Evaluation of Gene Ontology
is ongoing…
Interesting side effect…
Class-defining rules are generated by the realist evaluation
Most existing bio-ontologies lack formal class-definitions
This evaluation could be used to create such rules automatic classifiers
Can also detect what TYPE of property is best “classified” by current bio-ontologies
Is Realist Evaluation a Valid metric?
the realist evaluation measures the success of an ontology in classifying a specific set of
properties
We claim that this is a metric relating to the quality of that ontology
Is this metric any better than other metric like graph complexity, or fit-to-text?
Evaluatingmetrics
OntoLoki – Making mischief with Ontologies
1. Take an ontology that we claim is “good”
2. Make it worse by mischievously adding changes
3. Measure the degree of “mischief”
4. Run the evaluation metric of interest
55 Metric score should correlate with the amount of mischief added
Comparison of ontology quality metrics
Amount of noise added (ontology quality decreasing)
Good Good Metric Metric
Bad Metric
Mea
sure
d O
nto
log
y Q
ual
ity
Mea
sure
d O
nto
log
y Q
ual
ity
Is Reality Evaluation a good metric?
Let’s OntoLoki it to find out!
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0 0.05 0.1 0.15 0.2 0.25
OneR_avg_mean_KBi
Chi25_Jrip_avg_mean_KBi
Jrip_avg_mean_KBi
ZeroR_avg_mean_KBi
OntoLoki test of Realist Evaluation Metric
Good Metric!
Ave
rage
Cla
ss S
core
Noise Added (a measure of nodes affected)
Conclusion
Human computation can collect significant amounts of knowledge in an organized way
OntoLoki seems to be effective atevaluating the evaluation metricsReality evaluation is an interesting new
metric for testing ontologies
Subjective iCAPTURer Observations
Humans had an EXTREMELY difficult time classifying concepts into pre-existing
categories
Humans had an EXTREMELY difficult time defining new categories and placing them
into the existing classification system
Classification is
HARD!
Abandoning Classification
(briefly…)
An ontology is a representation of knowledge
Animal
Mammal
Primate
Lemur HumanGorilla
Hair
Big MediumSmall
has
has_size
is_a
Hair
Hair
Classes, instances
properties, relationships
AN ontology is ONE representation of knowledge
Animal
Mammal
Primate
Lemur HumanGorilla
Hair
Big MediumSmall
has
has_size
is_a
Hair
Hair
HairHair
Ontology of Anatomy
Animal
African_animal
Southern_African_animal
Aquatic plainsmountain
Africalives
is_a
Ontology of Habitat
Also might want… Odour, # digits, bone density, friendliness, cuteness..
AN ontology is ONE representation of knowledge
Clay Shirky: Ontology is Overrated…
• Attempts to predict the future– “Soviet Union” used to be a category in the Library
of Congress
• Attempts mind-reading– Size, location, odour.. Authors must predict what
users are interested in
• Great minds don’t think alike..– No two people are likely to create the same
ontology
http://www.shirky.com/writings/ontology_overrated.html
CategoriesProperties
Mass Mass CollaborativCollaborative Tagginge Tagging
BRAINS!! BRAINS!! MOREMORE
BRAINS!!BRAINS!!
Mass Open Social Tagging
A rapidly growing trend on the Web
Unstructured
Mass-collaboration
Anyone can say anything about anything using any words they wish
Connotea: Scientific Tagging(Connotea is a product of Nature Publishing Group)
Connotea Growth
Tagging is EASY!
The Tagged World
Tagging is easy!
Tagging costs nothing
Tagging empowers all viewpoints
Tagging is happening!!!!!!
Lexical Comparison of Tagging with
Formal Indexing Systemsand Ontologies
Ontology (FMA)
FMA Preflabels (11)
0
0.5
1
% OLP uniterms:
% OLP duplets:
OLP flexibility:
% containedByAnother:
Standard Deviation - Term
Length
Skewness - Term Length
complements:
compositions:
Ontology (GO Molecular Function)
GO_MF (15)
0
0.2
0.4
0.6
0.8% OLP uniterms:
% OLP duplets:
OLP flexibility:
% containedByAnother:
Standard Deviation - Term
Length
Skewness - Term Length
complements:
compositions:
Ontology (GO Biological Process)
GO_BP (13)
0
0.5
1% OLP uniterms:
% OLP duplets:
OLP flexibility:
% containedByAnother:
Standard Deviation - Term
Length
Skewness - Term Length
complements:
compositions:
Tagging (Bibsonomy)
Bibsonomy (20)
0
0.5
1% OLP uniterms:
% OLP duplets:
OLP flexibility:
%
containedByAnother:
Standard Deviation -
Term Length
Skewness - Term
Length
complements:
compositions:
Tagging (CiteULike)
CiteUlike (22)
0
0.5
1% OLP uniterms:
% OLP duplets:
OLP flexibility:
%
containedByAnother:
Standard Deviation -
Term Length
Skewness - Term
Length
complements:
compositions:
Tagging (Connotea)
Connotea (21)
0
0.5
1% OLP uniterms:
% OLP duplets:
OLP flexibility:
%
containedByAnother:
Standard Deviation -
Term Length
Skewness - Term
Length
complements:
compositions:
Ontologies and Folksonomies are fundamentally different!
GO_MF (15)
0
0.2
0.4
0.6
0.8% OLP uniterms:
% OLP duplets:
OLP flexibility:
% containedByAnother:
Standard Deviation - Term
Length
Skewness - Term Length
complements:
compositions:
Bibsonomy (20)
0
0.5
1% OLP uniterms:
% OLP duplets:
OLP flexibility:
%
containedByAnother:
Standard Deviation -
Term Length
Skewness - Term
Length
complements:
compositions:
Problem??
Folksonomies and ontologies are fundamentally different!
It may not be possible to derive one from the other accurately
Nevertheless, we would like to take advantage of tagging behaviour while gaining the power of
controlled vocabularies/Ontologies
E.D.The Entity Desciber
User types in all tags
Type-ahead
displays previously used tags
Connotea tagging
Connotea + E.D. Tagging
Leveraging Tagging?
“Tagging” effectively assigns properties to entities
ED Tagging constrains those properties to a controlled vocabulary or ontology
Can we discover patterns in those properties that indicate a “natural” classification system?
Can a “realist-evaluation” generate logical rules that define classes based on patterns of tags?
Final Thoughts
Ontologies are important, but hard to build
iCAPTURer: formal, template-based, cost-free consumption of biologist’s
brains seems to work!
Informal annotation (tagging) is cheap, easy, and scalable,
and is HAPPENING
Can we leverage tagging to create ontology-like structures? Maybe… Maybe not!
My journey back to Web Services
Why do I care about WS
so passionately?
The Deep Web
All the data and knowledge only accessible through Web Forms
Estimated to be orders of magnitude greater than the “surface Web”
- 91,000 Terabytes in the deep Web- 167 Terabytes in the Surface Web
Much of the Deep Web CANNOT be represented on the Semantic Web since it DOES NOT EXIST until the
Web Form is accessed
Moby 2.0 and
CardioSHARE
Merging the Deep Weband the Semantic Web
What Web Services do
SequenceData
BLAST SERVICE
Blast Hit
What BioMoby does
SequenceData
MOBY BLAST SERVICE
Blast Hit
Want Blast
The implied relationship between input and output
SequenceData
Blast Hit
givesBlastResult
Not “Bologically” Meaningful
The implied biological relationship between input and output
SequenceData
Blast Hit
hasHomologyTo
URIhasHomologyTo
URI
…looks a lot like the RDF statement…
To merge Web Servicesand the Semantic Web…
…Simply assertthe relationshipand let Moby do the rest!
Start with a partial Triple
URIrdf:type
Sequence
hasHomologyTo
What Moby 2.0 Does
MOBY BLAST SERVICE
Blast Hit
hasHomologyToURIrdf:type
Sequence
Moby 2.0
Predicate toWeb Service
Translator
Moby 2.0
hasHomologyTo property provided by
BLAST services
Need BLAST Service consuming rdf:type Sequence
Moby 2.0
Predicate toWeb Service
Translator
Moby 2.0 Query
FIND SERVICES THAT
Consume Sequence Data||
Provide hasHomologyTo Property||
Attached to other Sequence Data
Moby 2.0 extends SPARQL
SPARQL queries contain concepts and relationships of interest
Map RDF predicates onto Moby services capable of generating them
Registry query: “What Moby service consumes [subject] and generates the
[predicate] relationship type?”
But wait, there’s more!
CardioSHARE: Exploit knowledge in OWL-DL ontologies to enhance query
Subject Predicate Look up and execute Moby serviceConsumes proteins and generatesFunctional annotation property
Subject PredicateLook up and execute Moby serviceConsumes STK’s and Provides inhibitor property
Evaluate Query Expression
CardioSHARE: Exploit knowledge in OWL-DL ontologies to enhance query
This SPARQL query could be posed on a database of RAW, UNANNOTATEDProtein sequences, and be answered
by Moby 2.0
What do Moby 2.0 and CardioSHARE achieve?
Makes the Deep Web transparently accessible as if it were
a Semantic Web Resource
Allows SPARQL to do truly semantic queries!
Reduces the requirement of Biologists to know how/where to get
their data of interest
Simplifies construction of complex analytical pipelines by automating much of the
discovery/execution tasking
Ontology Spectrum
Catalog/ID
SelectedLogical
Constraints(disjointness,
inverse, …)
Terms/glossary
Thesauri“narrower
term”relation
Formalis-a
Frames(properties)
Informalis-a
Formalinstance
Value Restrs.
GeneralLogical
constraints
Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness.Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
Fin