27
Kidney and Urinary Pathways Knowledge Base (part of e-LICO) Simon Jupp University of Manchester Bio-ontologies, Boston July 9 2010 July 9, 2010 Bio-ontologies, Boston

Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Embed Size (px)

DESCRIPTION

Talk at bio-ontologies SIG at ISMB boston, 2010 on the KUPKB; presented by Simon Jupp

Citation preview

Page 1: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Kidney and Urinary Pathways Knowledge Base(part of e-LICO)

Simon JuppUniversity of ManchesterBio-ontologies, Boston

July 9 2010

July 9, 2010Bio-ontologies, Boston

Page 2: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Kidney and Urinary Knowledge Base and Ontology

KUP KB(RDF store)

Specialised repository of KUP related data KUP ontology for integration, query and inference Background knowledge for data mining experiments Collaborative update by the community

July 9, 2010Bio-ontologies, Boston

Page 3: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Chronic Renal Disease

Obstructive nephropathy

- first cause of end-stage renal disease in children.

Dialysis or transplantation

- 8000$/patient

A plumbing problem

Kidney

Ureter

Bladder

Urine

July 9, 2010Bio-ontologies, Boston

Page 4: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Collecting data

Proteome

Metabolome

Genome

urine

tissue

CE-MS

antibody array LC-MS/MSm/z

600 800 1000 1200 1400 1600

10

20

30

40

50

60

70

80

90

100

Inte

nsi

ty

609.256b6

755.422y8

882.357

b9

852.476

y9

995.435

b10

1092.506b11

1181.252y12

1318.578b13

1587.759b16

1715.817b18

858.408b18 ++

794.380b16 ++

0

miRNAarray

mRNA array

July 9, 2010Bio-ontologies, Boston

Page 5: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Genome Proteome MetabolomeOR OR

Identification of pathways instead of molecules

July 9, 2010Bio-ontologies, Boston

Page 6: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Genome Proteome MetabolomeAND AND

Identification of pathways instead of molecules

!

Identification of nodes in the pathophysiology of obstruction

July 9, 2010Bio-ontologies, Boston

Page 7: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

e-LICO

Expression data

KUP KB(RDF store)

Text-mining / Image mining

New modelsAnd hypothesis

Further wet labexperiments

e-LICO FP7 EU project.e-Laboratory for Interdisciplinary Collaborative research in data-mining and data-intensive sciences.

http://www.e-lico.eu

July 9, 2010Bio-ontologies, Boston

Page 8: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

e-LICO

Expression data

Text-mining / Image mining

New modelsAnd hypothesis

Further wet labexperiments

e-LICO FP7 EU project.e-Laboratory for Interdisciplinary Collaborative research in data-mining and data-intensive sciences.

http://www.e-lico.eu

KUP KB(RDF store)

Use Semantic Web technologies (RDF/OWL)for this part of our infrastructure

July 9, 2010Bio-ontologies, Boston

Page 9: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

REQUIREMENTS

Need low cost platform for data integration

Flexible data model– Community extensions

Use of controlled vocabularies– Ontologies for query and inferencing

KUP KB requirements

July 9, 2010Bio-ontologies, Boston

Page 10: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Kidney and Urinary Pathway Knowledge Base

1. Background knowledge to data-mining experiment

2. Repository of KUP experiments

http://www.e-lico.eu/kupkb

-omics data

Experimental data

July 9, 2010Bio-ontologies, Boston

Page 11: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

KUP KB prototype

Currently contain set of example queries that use the KUP ontology to query the data:

– Which Human genes have evidence for upregulation in the glomerulus?– In which tissue is "PLA2G4A" expressed and in which biological processes does

it participate?– What proteins participate in TGF-beta signaling pathways are where are they

upregulated in the kidney?

July 9, 2010Bio-ontologies, Boston

Page 12: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Querying the graph

KUPO Ontology

Entre gene

Gene X GO:0054426go:biological_process

Gene YMA:00345

kupo:002444

PT epithelial cell

rdfs:label

ro:part_of

MA:00456

kupo:004672

DT epithelial cell

rdfs:label

ro:part_of

Higgings Dataset

MA:000345

kupo:expressed_in

Gene YMA:00456

kupo:expressed_in

Proximal tubule

Distal tubule

Gene X

Query: What are the genes involved in Proteins transport expressed in Proximal Tubule Epithelial Cell?

July 9, 2010Bio-ontologies, Boston

Page 13: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

KUP KB: KUP ontology (alpha)

Anatomy (MAO)Anatomy (MAO) Gene Biological processes(GO)Gene Biological processes(GO)

Cells (CTO)Cells (CTO)

part-of

participate-in

Renal proximal

tubule

Renal proximal

tubule

Proximal straight tubule

Proximal straight tubule

Proximal convoluted

tubule

Proximal convoluted

tubule

Assertion

Inference

subClassOf

Proximal tubule

epithelial cell

Proximal tubule

epithelial cell

Proximal straighttubule

epithelial cell

Proximal straighttubule

epithelial cell

Proximal convoluted

tubule epithelial cell

Proximal convoluted

tubule epithelial cell

subClassOf

part-of

Renal sodium absorption

Renal sodium absorption

Renal sodium ion absorptionRenal sodium ion absorption

participates-in

part-of

participates-in

Kidney CortexKidney Cortex

part-of

part-of

Each kidney cell is currently described by its localisation and function

July 9, 2010Bio-ontologies, Boston

Page 14: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

The KUPO development process

CollaborativeSpreadsheetCollaborativeSpreadsheet

Individual SpreadsheetIndividual

Spreadsheet

Issue TrackerIssue Tracker

OPPLScript

Formulation

OPPLScript

Formulation

Generate OWL

Generate OWL

Reasoned OntologyReasoned Ontology

View OntologyView Ontology

July 9, 2010Bio-ontologies, Boston

Page 15: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

KUP KB: –omics data

Asserted relationship

geneid:17638geneid:17638

Entrez Gene IDEntrez

Gene ID

type

FaslFasl

symbol

AC18765AC18765encodes

UNIPROTID

UNIPROTID

type

We can represent -omics data as a graph

KEGG pathway

ID

KEGG pathway

ID

has:00527has:00527

type

participates-in

Fas-ligandFas-ligand

symbol

ApoptosisApoptosis

symbol

July 9, 2010Bio-ontologies, Boston

Page 16: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

KUP KB: experimental data

Asserted relationship

Geneid:17638Geneid:17638

GEO Experiment ID

GEO Experiment ID

GEO:028364GEO:028364

typesample

Differentially expressed genes

Differentially expressed genes

KUPO: Proximal

straight tubule

KUPO: Proximal

straight tubule

observation contains

Higgins et alHiggins et al

contributor

We can represent experimental data as a graph

July 9, 2010Bio-ontologies, Boston

Page 17: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Connecting the graphs

GEO:028364GEO:028364

sample

Differentially expressed genes

Differentially expressed genes

observation

contains

Higgins et alHiggins et al

contributor geneid:17638geneid:17638

FaslFasl

symbol

AC18765AC18765 has:00527has:00527participates-in

Fas-ligandFas-ligand

symbol

ApoptosisApoptosis

symbol

Renal proximal

tubule

Renal proximal

tubule

Proximal straight tubule

Proximal straight tubule

Proximal convoluted

tubule

Proximal convoluted

tubule

subClassOf

Proximal tubule

epithelial cell

Proximal tubule

epithelial cell

Proximal straighttubule

epithelial cell

Proximal straighttubule

epithelial cell

Proximal convoluted

tubule epithelial cell

Proximal convoluted

tubule epithelial cell

subClassOf

part-of

Renal sodium absorption

Renal sodium absorption

Renal sodium ion absorptionRenal sodium ion absorption

participates-inpart-of

participates-in

July 9, 2010Bio-ontologies, Boston

Page 18: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Bio2RDF

Best practices from W3C Health Care and Life Science Working group. Bio2RDF ontology as a schema

KUP KB(RDF store)

July 9, 2010Bio-ontologies, Boston

Page 19: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

So why RDF over RDMS?

Having a standard representation simply makes my life easier

Lots of heterogeneous KUP data to be integrated RDF allows me to to simply pile more data in

Natural support for ontologies Although limited

RDF alone isn’t enough Next step, intelligent agents and crawlers… How do we harness all this connected data

July 9, 2010Bio-ontologies, Boston

Page 20: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Challenges

Bad modelling (?)– Conflation of instances and classes

Cells bears some function (that is realised in some process) vs Cell participates in some Process

False statements and vague semantics– Trying to accommodate the biologists queries

– Mapping natural language to semantic relationships

– Experiments, expression data, gene lists etc.. It’s hard

Plus a whole list of general Semantic Web related issues

July 9, 2010Bio-ontologies, Boston

Page 21: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Data mining

Data mining experiments just started

SPARQL query to generate tables for background knowledge to data mining tools

Mine results for associations, clusters and predictive models. Build user friendly tools to hide the underlying technology

Results expected Y2 (later this year….)

July 9, 2010Bio-ontologies, Boston

Page 22: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Summary

Rapid and low cost data integration– Thanks to existing community efforts!!

Single SPARQL endpoint provides flexible queries– Especially useful for our data-mining queries

Rapid ontology development – Spreadsheets to engage domain experts

July 9, 2010Bio-ontologies, Boston

Page 23: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

KUP Knowledge Base in e-LICO

KUP KB(RDF store)

KUP KB(RDF store)

Bio2RDF

http://www.e-lico.eu/kupkb

E-LICOWorkflows

Use case data

Raw data

E-LICODB

E-LICODB

E-LICO Data Analysis

Web interface

Linked Open Data /Semantic Web /Bio ontologies

Linked Open Data /Semantic Web /Bio ontologies

Query

Results

Shared meta-data

July 9, 2010Bio-ontologies, Boston

Page 24: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Julie Klein, Joost Schanstra– Inserm, France

Robert Stevens– University of Manchester

EuroKUP members who already contributed to the ontology

Acknowledgements

July 9, 2010Bio-ontologies, Boston

Page 25: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Challenges

KUP KB implemented as triple store (Sesame)– Scalable

– Limited inference (RDFS)

Experiments with OWL– Classification possible (Fact++)

– DL Query language lack desirable features• Joins, Unions, Filters etc..

July 9, 2010Bio-ontologies, Boston

Page 26: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Challenges 2

Re-use existing RDF datasets– Bio2RDF could be improved

– URI guidelines unclear• PURLs or OBO URI?

Bio-portal, OBO foundry, Bio2RDF….– RDF endpoint to bio-portal is great!

July 9, 2010Bio-ontologies, Boston

Page 27: Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Challenges 4

Warehoused data– I don’t want to maintain other peoples data

Linked data and query federation– What is possible now?

– SADI framework

July 9, 2010Bio-ontologies, Boston