Upload
robertstevens65
View
93
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Talk at bio-ontologies SIG at ISMB boston, 2010 on the KUPKB; presented by Simon Jupp
Citation preview
Kidney and Urinary Pathways Knowledge Base(part of e-LICO)
Simon JuppUniversity of ManchesterBio-ontologies, Boston
July 9 2010
July 9, 2010Bio-ontologies, Boston
Kidney and Urinary Knowledge Base and Ontology
KUP KB(RDF store)
Specialised repository of KUP related data KUP ontology for integration, query and inference Background knowledge for data mining experiments Collaborative update by the community
July 9, 2010Bio-ontologies, Boston
Chronic Renal Disease
Obstructive nephropathy
- first cause of end-stage renal disease in children.
Dialysis or transplantation
- 8000$/patient
A plumbing problem
Kidney
Ureter
Bladder
Urine
July 9, 2010Bio-ontologies, Boston
Collecting data
Proteome
Metabolome
Genome
urine
tissue
CE-MS
antibody array LC-MS/MSm/z
600 800 1000 1200 1400 1600
10
20
30
40
50
60
70
80
90
100
Inte
nsi
ty
609.256b6
755.422y8
882.357
b9
852.476
y9
995.435
b10
1092.506b11
1181.252y12
1318.578b13
1587.759b16
1715.817b18
858.408b18 ++
794.380b16 ++
0
miRNAarray
mRNA array
July 9, 2010Bio-ontologies, Boston
Genome Proteome MetabolomeOR OR
Identification of pathways instead of molecules
July 9, 2010Bio-ontologies, Boston
Genome Proteome MetabolomeAND AND
Identification of pathways instead of molecules
!
Identification of nodes in the pathophysiology of obstruction
July 9, 2010Bio-ontologies, Boston
e-LICO
Expression data
KUP KB(RDF store)
Text-mining / Image mining
New modelsAnd hypothesis
Further wet labexperiments
e-LICO FP7 EU project.e-Laboratory for Interdisciplinary Collaborative research in data-mining and data-intensive sciences.
http://www.e-lico.eu
July 9, 2010Bio-ontologies, Boston
e-LICO
Expression data
Text-mining / Image mining
New modelsAnd hypothesis
Further wet labexperiments
e-LICO FP7 EU project.e-Laboratory for Interdisciplinary Collaborative research in data-mining and data-intensive sciences.
http://www.e-lico.eu
KUP KB(RDF store)
Use Semantic Web technologies (RDF/OWL)for this part of our infrastructure
July 9, 2010Bio-ontologies, Boston
REQUIREMENTS
Need low cost platform for data integration
Flexible data model– Community extensions
Use of controlled vocabularies– Ontologies for query and inferencing
KUP KB requirements
July 9, 2010Bio-ontologies, Boston
Kidney and Urinary Pathway Knowledge Base
1. Background knowledge to data-mining experiment
2. Repository of KUP experiments
http://www.e-lico.eu/kupkb
-omics data
Experimental data
July 9, 2010Bio-ontologies, Boston
KUP KB prototype
Currently contain set of example queries that use the KUP ontology to query the data:
– Which Human genes have evidence for upregulation in the glomerulus?– In which tissue is "PLA2G4A" expressed and in which biological processes does
it participate?– What proteins participate in TGF-beta signaling pathways are where are they
upregulated in the kidney?
July 9, 2010Bio-ontologies, Boston
Querying the graph
KUPO Ontology
Entre gene
Gene X GO:0054426go:biological_process
Gene YMA:00345
kupo:002444
PT epithelial cell
rdfs:label
ro:part_of
MA:00456
kupo:004672
DT epithelial cell
rdfs:label
ro:part_of
Higgings Dataset
MA:000345
kupo:expressed_in
Gene YMA:00456
kupo:expressed_in
Proximal tubule
Distal tubule
Gene X
Query: What are the genes involved in Proteins transport expressed in Proximal Tubule Epithelial Cell?
July 9, 2010Bio-ontologies, Boston
KUP KB: KUP ontology (alpha)
Anatomy (MAO)Anatomy (MAO) Gene Biological processes(GO)Gene Biological processes(GO)
Cells (CTO)Cells (CTO)
part-of
participate-in
Renal proximal
tubule
Renal proximal
tubule
Proximal straight tubule
Proximal straight tubule
Proximal convoluted
tubule
Proximal convoluted
tubule
Assertion
Inference
subClassOf
Proximal tubule
epithelial cell
Proximal tubule
epithelial cell
Proximal straighttubule
epithelial cell
Proximal straighttubule
epithelial cell
Proximal convoluted
tubule epithelial cell
Proximal convoluted
tubule epithelial cell
subClassOf
part-of
Renal sodium absorption
Renal sodium absorption
Renal sodium ion absorptionRenal sodium ion absorption
participates-in
part-of
participates-in
Kidney CortexKidney Cortex
part-of
part-of
Each kidney cell is currently described by its localisation and function
July 9, 2010Bio-ontologies, Boston
The KUPO development process
CollaborativeSpreadsheetCollaborativeSpreadsheet
Individual SpreadsheetIndividual
Spreadsheet
Issue TrackerIssue Tracker
OPPLScript
Formulation
OPPLScript
Formulation
Generate OWL
Generate OWL
Reasoned OntologyReasoned Ontology
View OntologyView Ontology
July 9, 2010Bio-ontologies, Boston
KUP KB: –omics data
Asserted relationship
geneid:17638geneid:17638
Entrez Gene IDEntrez
Gene ID
type
FaslFasl
symbol
AC18765AC18765encodes
UNIPROTID
UNIPROTID
type
We can represent -omics data as a graph
KEGG pathway
ID
KEGG pathway
ID
has:00527has:00527
type
participates-in
Fas-ligandFas-ligand
symbol
ApoptosisApoptosis
symbol
July 9, 2010Bio-ontologies, Boston
KUP KB: experimental data
Asserted relationship
Geneid:17638Geneid:17638
GEO Experiment ID
GEO Experiment ID
GEO:028364GEO:028364
typesample
Differentially expressed genes
Differentially expressed genes
KUPO: Proximal
straight tubule
KUPO: Proximal
straight tubule
observation contains
Higgins et alHiggins et al
contributor
We can represent experimental data as a graph
July 9, 2010Bio-ontologies, Boston
Connecting the graphs
GEO:028364GEO:028364
sample
Differentially expressed genes
Differentially expressed genes
observation
contains
Higgins et alHiggins et al
contributor geneid:17638geneid:17638
FaslFasl
symbol
AC18765AC18765 has:00527has:00527participates-in
Fas-ligandFas-ligand
symbol
ApoptosisApoptosis
symbol
Renal proximal
tubule
Renal proximal
tubule
Proximal straight tubule
Proximal straight tubule
Proximal convoluted
tubule
Proximal convoluted
tubule
subClassOf
Proximal tubule
epithelial cell
Proximal tubule
epithelial cell
Proximal straighttubule
epithelial cell
Proximal straighttubule
epithelial cell
Proximal convoluted
tubule epithelial cell
Proximal convoluted
tubule epithelial cell
subClassOf
part-of
Renal sodium absorption
Renal sodium absorption
Renal sodium ion absorptionRenal sodium ion absorption
participates-inpart-of
participates-in
July 9, 2010Bio-ontologies, Boston
Bio2RDF
Best practices from W3C Health Care and Life Science Working group. Bio2RDF ontology as a schema
KUP KB(RDF store)
July 9, 2010Bio-ontologies, Boston
So why RDF over RDMS?
Having a standard representation simply makes my life easier
Lots of heterogeneous KUP data to be integrated RDF allows me to to simply pile more data in
Natural support for ontologies Although limited
RDF alone isn’t enough Next step, intelligent agents and crawlers… How do we harness all this connected data
July 9, 2010Bio-ontologies, Boston
Challenges
Bad modelling (?)– Conflation of instances and classes
Cells bears some function (that is realised in some process) vs Cell participates in some Process
False statements and vague semantics– Trying to accommodate the biologists queries
– Mapping natural language to semantic relationships
– Experiments, expression data, gene lists etc.. It’s hard
Plus a whole list of general Semantic Web related issues
July 9, 2010Bio-ontologies, Boston
Data mining
Data mining experiments just started
SPARQL query to generate tables for background knowledge to data mining tools
Mine results for associations, clusters and predictive models. Build user friendly tools to hide the underlying technology
Results expected Y2 (later this year….)
July 9, 2010Bio-ontologies, Boston
Summary
Rapid and low cost data integration– Thanks to existing community efforts!!
Single SPARQL endpoint provides flexible queries– Especially useful for our data-mining queries
Rapid ontology development – Spreadsheets to engage domain experts
July 9, 2010Bio-ontologies, Boston
KUP Knowledge Base in e-LICO
KUP KB(RDF store)
KUP KB(RDF store)
Bio2RDF
http://www.e-lico.eu/kupkb
E-LICOWorkflows
Use case data
Raw data
E-LICODB
E-LICODB
E-LICO Data Analysis
Web interface
Linked Open Data /Semantic Web /Bio ontologies
Linked Open Data /Semantic Web /Bio ontologies
Query
Results
Shared meta-data
July 9, 2010Bio-ontologies, Boston
Julie Klein, Joost Schanstra– Inserm, France
Robert Stevens– University of Manchester
EuroKUP members who already contributed to the ontology
Acknowledgements
July 9, 2010Bio-ontologies, Boston
Challenges
KUP KB implemented as triple store (Sesame)– Scalable
– Limited inference (RDFS)
Experiments with OWL– Classification possible (Fact++)
– DL Query language lack desirable features• Joins, Unions, Filters etc..
July 9, 2010Bio-ontologies, Boston
Challenges 2
Re-use existing RDF datasets– Bio2RDF could be improved
– URI guidelines unclear• PURLs or OBO URI?
Bio-portal, OBO foundry, Bio2RDF….– RDF endpoint to bio-portal is great!
July 9, 2010Bio-ontologies, Boston
Challenges 4
Warehoused data– I don’t want to maintain other peoples data
Linked data and query federation– What is possible now?
– SADI framework
July 9, 2010Bio-ontologies, Boston