Upload
dagmar-waltemath
View
649
Download
0
Embed Size (px)
DESCRIPTION
Ron Henkel's presentation of our Ranked Retrieval approach; 2012 PALs meeting of the Sysmo-SEEK project in Heidelberg, Germany. 28th-30th of November 2012.
Citation preview
29.11.2012 © 2009 UNIVERSITÄT ROSTOCK
Graph based storage and retrieval of computational models
Department of Systems Biology and Bioinformatics University of Rostock
www.sbi.uni-rostock.de
Ron Henkel, Martin Scharm, Dagmar Waltemath, Olaf Wolkenhauer
Motivation
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 2
Data from BioModels Database
0
20000
40000
60000
80000
100000
120000
0
100
200
300
400
500
600
700
800
900
1000
Apr 05
Jul 05
Okt 05
Jan 06
Apr 06
Jul 06
Okt 06
Jan 07
Apr 07
Jul 07
Okt 07
Jan 08
Apr 08
Jul 08
Okt 08
Jan 09
Apr 09
Jul 09
Okt 09
Jan 10
Apr 10
Jul 10
Okt 10
Jan 11
Apr 11
Jul 11
Okt 11
Jan 12
Apr 12
Jul 12
Num
ber o
f Ann
otat
ions
Num
ber o
f Mod
els
Models
Annotation
Motivation
• Models:
Grow in number and complexity Are provided with supplementary material Evolve over time
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 3
State of the Art
• Storage: Relational Databases Model files on Hard Disk Drive (HDD) Additional files (images, result sets, paper)
• Search:
SQL statements Facetted search Data browsing
29.11.2012 © 2009 UNIVERSITÄT ROSTOCK 4
State of the Art - Demo
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 5
Available Data for Ranked Retrieval
29.01.2012 © 2009 UNIVERSITÄT ROSTOCK 6
Model file Annotation & Ontologies A model‘s network
• Constituent names • Model code
• Biochemical background • Synonyms
• Model structure • Aggregate values
Available Data for Ranked Retrieval
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 7
# aspect importance contained features 1 Administrative none ids, file name, version, formalism… 2 Person medium creator, encoder, submitter, publication author 3 Dates low creation and modification date 4 Publication high title, abstract, full-text, journal 5 Constituents very high compartment, species, reaction 6 User content very high keywords, tags, remarks, changes
• The concept is abstract and can be applied to different model formalisms. • Depending on the formalism the aspects can be refined into features. • The model constituents also contain the annotations. Henkel et al. (2010) BMC Bioinf
Biomodels Database – A Test Case
• Apache Lucene Framework
• Model Index 425 models, 140.977 terms
• Semantic Index 2261 URIs, 409.124 terms
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 8
http://www.ebi.ac.uk/biomodels-demo/
Demo
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 9
Improvements
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 10
• Ranking • Enhanced query possibilities
Required, optional and excluded criteria Allow full-text and Ontology queries
• Example: “Find cell cycle models”
Query BiomodelsDB Using IR Gold Standard cell cycle 135 173 n/a
“cell cycle” 14 26 28
Available Data for Ranked Retrieval
29.11,2012 © 2009 UNIVERSITÄT ROSTOCK 11
Model based Annotation & Ontologies A model‘s network
• Model name • Model code
• Biochemical background • Allows to identify e.g. synonyms
• Include model structure • Aggregate values
29.11.2012 © 2009 UNIVERSITÄT ROSTOCK 12
A model‘s network
• Include model structure • Aggregate values
Mapping a Model to a Database
Advantages of Graph Databases
• Easy mapping of model structure • Fast browsing through models • Flexible and schema-free storage • Easy linking to models, simulation setups or results,
and external resources
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 13
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 14
Document
Model
P E CR S
SBO:0000268 uniprot:P07101
uniprot:Q03393 GO:0005737HGNC:8582
is
isV
ersi
onO
f
is
isE
ncod
edB
y
is
asProductasReactant
asModifier
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 15
Preliminary Results
• All models stored in Biomodels DB were stored into the graph database
• Implemented storage and search in Jummp official demo release upcoming
• Added 140.811 models from path2models project done, but including annotation blows the memory database scales well and is reasonably fast
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 16
Demo
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 17
Future Work: Relate model versions
• Link successor and predecessor • Relate changed entities • Store the diff
• Enable version control for multi-
document models • Propagate changes for imported models
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 18
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 19
SEMS: Methods for Model & Simulation Management
• XML version control • Difference detection in XML
Waltemath et al., submitted
• Ranked model retrieval Henkel et al., 2010 (BMC Bioinf)
• Structure- and ontology-based search
Simulation VC SimulationSearch Simulation Storage
• Relational databases Waltemath et al., 2011 (DBSpektrum)
• Graph-based storage Henkel et al., 2012 (INFORMATIK)
• Standardized encoding of simulation setups Waltemath et al., 2011 (BMC SysBiol)
• Linking models and simulation descriptions Henkel et al., 2012 (INFORMATIK)
Model Search Model Version control Model Storage
Take Home Message
• Ranked retrieval is a necessary feature for model databases.
• The model’s inherent structure should be queryable. • Graph based storage reflects well a model‘s encoding
and evolution.
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 20
Thanks for your attention.
Questions?
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 21