21
29.11.2012 © 2009 UNIVERSITÄT ROSTOCK Graph based storage and retrieval of computational models Department of Systems Biology and Bioinformatics University of Rostock www.sbi.uni-rostock.de Ron Henkel , Martin Scharm, Dagmar Waltemath, Olaf Wolkenhauer

SEMS: Model search and ranked Retrieval (Ron Henkel)

Embed Size (px)

DESCRIPTION

Ron Henkel's presentation of our Ranked Retrieval approach; 2012 PALs meeting of the Sysmo-SEEK project in Heidelberg, Germany. 28th-30th of November 2012.

Citation preview

Page 1: SEMS: Model search and ranked Retrieval (Ron Henkel)

29.11.2012 © 2009 UNIVERSITÄT ROSTOCK

Graph based storage and retrieval of computational models

Department of Systems Biology and Bioinformatics University of Rostock

www.sbi.uni-rostock.de

Ron Henkel, Martin Scharm, Dagmar Waltemath, Olaf Wolkenhauer

Page 2: SEMS: Model search and ranked Retrieval (Ron Henkel)

Motivation

11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 2

Data from BioModels Database

0

20000

40000

60000

80000

100000

120000

0

100

200

300

400

500

600

700

800

900

1000

Apr 05

Jul 05

Okt 05

Jan 06

Apr 06

Jul 06

Okt 06

Jan 07

Apr 07

Jul 07

Okt 07

Jan 08

Apr 08

Jul 08

Okt 08

Jan 09

Apr 09

Jul 09

Okt 09

Jan 10

Apr 10

Jul 10

Okt 10

Jan 11

Apr 11

Jul 11

Okt 11

Jan 12

Apr 12

Jul 12

Num

ber o

f Ann

otat

ions

Num

ber o

f Mod

els

Models

Annotation

Page 3: SEMS: Model search and ranked Retrieval (Ron Henkel)

Motivation

• Models:

Grow in number and complexity Are provided with supplementary material Evolve over time

11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 3

Page 4: SEMS: Model search and ranked Retrieval (Ron Henkel)

State of the Art

• Storage: Relational Databases Model files on Hard Disk Drive (HDD) Additional files (images, result sets, paper)

• Search:

SQL statements Facetted search Data browsing

29.11.2012 © 2009 UNIVERSITÄT ROSTOCK 4

Page 5: SEMS: Model search and ranked Retrieval (Ron Henkel)

State of the Art - Demo

11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 5

Page 6: SEMS: Model search and ranked Retrieval (Ron Henkel)

Available Data for Ranked Retrieval

29.01.2012 © 2009 UNIVERSITÄT ROSTOCK 6

Model file Annotation & Ontologies A model‘s network

• Constituent names • Model code

• Biochemical background • Synonyms

• Model structure • Aggregate values

Page 7: SEMS: Model search and ranked Retrieval (Ron Henkel)

Available Data for Ranked Retrieval

11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 7

# aspect importance contained features 1 Administrative none ids, file name, version, formalism… 2 Person medium creator, encoder, submitter, publication author 3 Dates low creation and modification date 4 Publication high title, abstract, full-text, journal 5 Constituents very high compartment, species, reaction 6 User content very high keywords, tags, remarks, changes

• The concept is abstract and can be applied to different model formalisms. • Depending on the formalism the aspects can be refined into features. • The model constituents also contain the annotations. Henkel et al. (2010) BMC Bioinf

Page 8: SEMS: Model search and ranked Retrieval (Ron Henkel)

Biomodels Database – A Test Case

• Apache Lucene Framework

• Model Index 425 models, 140.977 terms

• Semantic Index 2261 URIs, 409.124 terms

11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 8

http://www.ebi.ac.uk/biomodels-demo/

Page 9: SEMS: Model search and ranked Retrieval (Ron Henkel)

Demo

11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 9

Page 10: SEMS: Model search and ranked Retrieval (Ron Henkel)

Improvements

11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 10

• Ranking • Enhanced query possibilities

Required, optional and excluded criteria Allow full-text and Ontology queries

• Example: “Find cell cycle models”

Query BiomodelsDB Using IR Gold Standard cell cycle 135 173 n/a

“cell cycle” 14 26 28

Page 11: SEMS: Model search and ranked Retrieval (Ron Henkel)

Available Data for Ranked Retrieval

29.11,2012 © 2009 UNIVERSITÄT ROSTOCK 11

Model based Annotation & Ontologies A model‘s network

• Model name • Model code

• Biochemical background • Allows to identify e.g. synonyms

• Include model structure • Aggregate values

Page 12: SEMS: Model search and ranked Retrieval (Ron Henkel)

29.11.2012 © 2009 UNIVERSITÄT ROSTOCK 12

A model‘s network

• Include model structure • Aggregate values

Mapping a Model to a Database

Page 13: SEMS: Model search and ranked Retrieval (Ron Henkel)

Advantages of Graph Databases

• Easy mapping of model structure • Fast browsing through models • Flexible and schema-free storage • Easy linking to models, simulation setups or results,

and external resources

11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 13

Page 14: SEMS: Model search and ranked Retrieval (Ron Henkel)

11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 14

Document

Model

P E CR S

SBO:0000268 uniprot:P07101

uniprot:Q03393 GO:0005737HGNC:8582

is

isV

ersi

onO

f

is

isE

ncod

edB

y

is

asProductasReactant

asModifier

Page 15: SEMS: Model search and ranked Retrieval (Ron Henkel)

11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 15

Page 16: SEMS: Model search and ranked Retrieval (Ron Henkel)

Preliminary Results

• All models stored in Biomodels DB were stored into the graph database

• Implemented storage and search in Jummp official demo release upcoming

• Added 140.811 models from path2models project done, but including annotation blows the memory database scales well and is reasonably fast

11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 16

Page 17: SEMS: Model search and ranked Retrieval (Ron Henkel)

Demo

11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 17

Page 18: SEMS: Model search and ranked Retrieval (Ron Henkel)

Future Work: Relate model versions

• Link successor and predecessor • Relate changed entities • Store the diff

• Enable version control for multi-

document models • Propagate changes for imported models

11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 18

Page 19: SEMS: Model search and ranked Retrieval (Ron Henkel)

11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 19

SEMS: Methods for Model & Simulation Management

• XML version control • Difference detection in XML

Waltemath et al., submitted

• Ranked model retrieval Henkel et al., 2010 (BMC Bioinf)

• Structure- and ontology-based search

Simulation VC SimulationSearch Simulation Storage

• Relational databases Waltemath et al., 2011 (DBSpektrum)

• Graph-based storage Henkel et al., 2012 (INFORMATIK)

• Standardized encoding of simulation setups Waltemath et al., 2011 (BMC SysBiol)

• Linking models and simulation descriptions Henkel et al., 2012 (INFORMATIK)

Model Search Model Version control Model Storage

Page 20: SEMS: Model search and ranked Retrieval (Ron Henkel)

Take Home Message

• Ranked retrieval is a necessary feature for model databases.

• The model’s inherent structure should be queryable. • Graph based storage reflects well a model‘s encoding

and evolution.

11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 20

Page 21: SEMS: Model search and ranked Retrieval (Ron Henkel)

Thanks for your attention.

Questions?

11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 21

[email protected]