SYSTEMS BIOLOGY
BIOINFORMATICS
ROSTOCKS E Ssimulation experiment management system
The CellML models’ walkthrough the repositoryA retrospective study
MARTIN SCHARM, DAGMAR WALTEMATHDepartment of Systems Biology & Bioinformatics, University of Rostock
http://sems.uni-rostock.de
8th International CellML WorkshopAuckland, New Zealand 2014
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 1
SYSTEMS BIOLOGY
BIOINFORMATICS
ROSTOCKS E Ssimulation experiment management system
S E Ssimulation experiment management system
Graph Database
Version Control
∆ ∆
Retrieval
Ranking
retrieve
rank
track development
store
Version 1
Version 2
latest
http://sems.uni-rostock.de/
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 2
The CellML Model Repository
2007 2008 2009 2010 2011 2012 2013
050
010
0015
0020
0025
00
Number of models in the repository
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 3
Version Controlgood news
A r C
B
D
cycE/cdk2
RB/E2F
RB-Hypo
free E2F
A r
B
C
D
E s
RB/E2F
RB-Hypo
free E2F
cycE/cdk2
RB-Phos
new insights
Waltemath et al.: Improving the reuse of computational models through versioncontrol. Bioinformatics (2013) 29(6): 742-728;
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 4
BiVeSDifference Detection
A r C
B
D
cycE/cdk2
RB/E2F
RB-Hypo
free E2F
A r
B
C
D
E s
RB/E2F
RB-Hypo
free E2F
cycE/cdk2
RB-Phos
A
r
B
C
D
A
r
B
C
D
E
s
Biochemical Model Version Control System
• compares models encoded in standadizedformats (currently: and )
• maps hierarchically structured content
• constructs a delta (in XML format)
• is able to interprete this diff
<XML>Diff
movesproduct of r: C
deletesproduct of r: B
insertsspecies: Eproduct of r: Ereaction s
</XML>
mapping
diff construction
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 5
BudHatDiff Visualization
A r C
B
D
cycE/cdk2
RB/E2F
RB-Hypo
free E2F
A r
B
C
D
E s
RB/E2F
RB-Hypo
free E2F
cycE/cdk2
RB-Phos
A
r
B
C
D
A
r
B
C
D
E
s
<XML>Diff
movesproduct of r: C
deletesproduct of r: B
insertsspecies: Eproduct of r: Ereaction s
</XML>
• calls BiVeS to construct the diff
• displays the result in various formats• the XML diff• a reaction network highlighting the
changes using• a human readable report
A r B
C
D
E s
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 6
The CellML Model Repository
for r in $repositories
do
hg clone $r
done
∅CellML Files 2702
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 7
The CellML Model Repository
M1
1M
1
2M
1
3
M2
2M
2
3
M3
3
∅CellML Files 2702∅Versions 18329
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 7
The CellML Model Repository
M1
1M
1
2M
1
3
M2
2M
2
3
M3
3
∆ ∆
∆
∅CellML Files 2702∅Versions 18329
∅#Deltas 15626
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 7
The CellML Model Repository
M1
1M
1
2M
1
3
M2
2M
2
3
M3
3
∆ ∆
∆
∅CellML Files 2702∅Versions 18329
∅#Deltas 15626∅#relevant 2833∅#operations 355.31
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 7
The CellML Model RepositoryGrowth
2007 2008 2009 2010 2011 2012 2013
050
010
0015
0020
0025
00
Number of models in the repository
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 8
The CellML Model RepositoryGrowth
2007 2008 2009 2010 2011 2012 2013
050
010
0015
0020
0025
00
Number of models in the repositoryAvg number of nodes in an XML tree
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 9
The CellML Model RepositoryGrowth
2007 2008 2009 2010 2011 2012 2013
050
010
0015
0020
0025
00
Number of models in the repositoryAvg number of nodes in an XML tree
Dec 3rd, 2010: about 800 models with
mean size of ≈56 nodes per model
in repositories SVP_0000*
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 9
The CellML Model RepositoryGrowth
2007 2008 2009 2010 2011 2012 2013
050
010
0015
0020
0025
00
Number of models in the repositoryAvg number of nodes in an XML tree
Dec 3rd, 2010: about 800 models with
mean size of ≈56 nodes per model
in repositories SVP_0000*
“Here we describe the development of an online re-pository of Standard Virtual Biological Parts (SVPs)– mathematical model components describing thefunction of SBPs [(Standard Biological Parts)] whichcan be downloaded, extended and recombined to aidthe design, in silico, of synthetic biological systems.”
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 9
The CellML Model RepositoryGrowth
2007 2008 2009 2010 2011 2012 2013
05
1015
20
units
,impo
rts,
com
pone
nts
mean num unitsmean num importsmean num componentsmean num variables
050
100
150
varia
bles
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 10
The CellML Model RepositoryTypes of Operation
A
B
C D
E
F
A
B
G D
E
C
x=“y” x=“z”
● ●●●● ●●●● ●●●● ●● ●● ●●● ● ●●●●● ●● ● ●●● ● ●●●●●●● ●●●● ●●●● ●●●● ●● ●● ●
●●● ●●● ●●●● ●● ● ●●● ● ●● ●●● ●● ●● ●●●● ●●● ● ●●●● ●● ●●● ●● ●●● ● ●● ●● ● ●● ●● ●●●● ● ● ●●● ●●●●● ●●● ●● ● ● ●● ●●● ●●●● ●● ● ● ●●● ●●●●●●●●● ● ●● ●● ● ●●● ●●●● ●●● ● ●●● ●●●● ● ● ●● ●●● ●● ●● ●● ●● ● ●● ●●● ●● ●●●● ●●● ●● ●●● ●● ●● ●● ● ●● ● ●●● ●●● ● ● ●● ●●● ● ●●● ●● ● ●● ●●●● ● ●●● ● ●●● ●● ●●● ●● ●●●● ●●● ●●● ●
● ●●●● ●●●●●●● ●●● ●● ●●● ●● ●●● ●●●●●●●●● ●● ●●● ● ●●●● ●● ●● ●● ●● ●● ● ●●●●●●● ●● ●●●● ●● ●●● ●● ●● ●● ●●● ● ●●● ● ●● ●● ●●● ●● ● ● ●●●● ● ●● ●● ●● ● ●● ●● ●●● ●● ●●●●● ●● ●●● ●● ● ●●● ●●● ●●● ●● ●●●● ● ●●●●● ●●● ●●●● ●● ●● ●● ● ●●● ●● ● ●● ●●● ●● ●●●● ● ●●● ●●● ●●●● ●● ● ●●●● ●● ●●●●● ●●●● ●
● ●●●●●● ●● ● ●●● ●●●●● ●●● ●●●● ●● ● ●●● ●● ●● ●●●●● ● ●● ● ●● ●●● ●● ●● ● ●● ●● ●● ●● ●● ●●● ● ●●●●●●● ●●●● ●●●●●●● ● ●●● ●●●●● ●●●● ●● ●●●●●● ●● ● ●●● ●● ●●●● ● ● ●●●●● ● ●●● ● ●● ●●● ●●●●●●●●● ●●● ●● ●● ●●●● ●●●● ●● ●●●●●●● ● ●
1 5 10 50 100 500 1000
updates
deletes
inserts
moves
median50%
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 11
The CellML Model RepositoryModification Targets
<component name=“Template_Species”><variable name=“time” units=“second” public_interface=“in”/><variable name=“concentration” units=“nM” initial_value=“concInit” public_interface=“out”/><variable name=“JGain” units=“nM_per_s” public_interface=“in”/><math xmlns=“http://www.w3.org/1998/Math/MathML”>
<apply><eq/><apply>
<diff/><bvar>
<ci>time</ci></bvar><ci>concentration</ci>
</apply><ci>JGain</ci>
</apply></math>
</component>
● ●● ●● ●● ●● ●●● ●● ● ●●● ●●● ●●●● ● ● ●●●●● ●●● ●● ●● ●● ●● ●●● ●●● ● ●● ●● ●● ●● ●●●●● ●● ● ●● ●●● ●●●● ●● ●●●● ●●● ●● ●●●● ●●● ●●●● ● ● ●●●● ●● ●●● ●●●● ●● ●● ● ●● ●●●● ●● ●●● ●● ●●● ●● ●● ● ●● ●● ●●● ●● ●● ●●●●●●● ●●● ●●● ● ●● ●●●● ●● ●● ●● ●● ●●●● ●● ●●●● ●
●● ● ●●●● ●●● ● ●● ●●●● ● ● ●●● ●●●●● ●● ●● ●●●●●● ●● ●● ●●●●● ●● ● ●● ●● ●●●●● ●●● ●●● ●● ● ● ●●● ●● ●● ●●● ●●● ●● ● ●●● ●● ●● ●● ●● ●●●● ●●●● ●●●●●●●● ●● ●●●●● ●●● ● ●●● ●●● ●●●●● ●● ●●●●● ● ●●● ●● ●● ●●● ●●● ● ●●●●● ●● ●● ●●●● ●●●●● ● ● ●● ● ●●●● ●●●●● ●● ●●● ● ●●●● ● ● ●●●●● ●● ●● ●●● ●●●●●● ●● ●●● ●●●● ●
●● ●● ●●●● ● ●●● ●● ●● ●●●●● ●● ●●● ● ●● ●●● ●●● ●● ●● ●● ●●● ●● ●● ● ●●●● ●● ●●●● ●●●●●● ●● ● ●●● ● ●●● ●●● ●●● ●● ●● ● ●● ●●● ● ●●●● ● ●●●●●● ●●● ●●● ● ● ●●● ● ●● ●● ●● ●● ●● ●●●● ●●● ●● ●● ●● ●● ● ●● ● ●●●● ● ●●● ●● ● ●●●● ●● ●● ● ●● ●●● ●●●● ●●●● ●●● ●●● ● ●● ● ●● ●● ●●● ●● ●●●● ● ●● ●● ● ●●●● ● ●● ●●●●● ●● ●●●● ●●● ● ●●●● ●● ●●●● ●● ●●●●● ● ●●●● ●● ● ●● ● ●●
1 5 10 50 100 500 1000
document nodes
attributes
text nodes
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 12
The CellML Model RepositoryOperations
Novak1993 12 updates 20 moves 80 inserts 20 deletes
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 13
SYSTEMS BIOLOGY
BIOINFORMATICS
ROSTOCKS E Ssimulation experiment management system
Thank you for your attention!
SEMS group
Dagmar WaltemathRon HenkelMartin PetersMarkus WolfienRebekka AlmOlaf Wolkenhauer
@SemsProject
http://sems.uni-rostock.de
Apr 14, 2014 SEMS | Martin Scharm, Dagmar Waltemath 14